Date Topic covered Assignments
     
10-Jan Class syllabus
Introduction to information retrieval (all) Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval
Introduction to search engines (all) Search Engine, http://en.wikipedia.org/wiki/Search_engine
(all) List of search engines, http://en.wikipedia.org/wiki/List_of_search_engines
17-Jan Complexity and scalability of search (all) Scalability, http://en.wikipedia.org/wiki/Scalability
How much information (all) Big O, http://en.wikipedia.org/wiki/Big_O_notation
Exercise 1 (all) Unstructured data, http://en.wikipedia.org/wiki/Unstructured_data
(all) https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
24-Jan Concept of a document (all) Greengrass: 2.1.1-2.1.4, 2.1.6
Retrieval evaluation (all) van Rijsbergen: Ch 7, (up to Swets model) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
Enterprise, specialty (vertical) search (graduate students required) Manning, Raghavan, Schutze: Ch 8
(all) https://en.wikipedia.org/wiki/Enterprise_search
31-Jan Robots.txt (all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling
Web crawling (all) Robots.txt exclusion principle, http://www.robotstxt.org/
Specialy search engines (all) Heritrix, http://en.wikipedia.org/wiki/Heritrix
Exercise 1 due (all) Manning, Raghavan, Schutze: Ch 20.1-20.2
(all) Vertical Search White Paper" Slack Barsinger, http://clgiles.ist.psu.edu/IST441/materials/papers/
(all) Search Engine Technology, http://en.wikipedia.org/wiki/Search_engine_technology
7-Feb University closed
Exercise 2
14-Feb Properties of text (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1
Query models (all) Greengrass: 3 & 4
21-Feb Classic information retrieval - vector models (all) Greengrass: 6.1-6.4
Similarity ranking (all) Manning, Raghavan, Schutze: Ch 6
Exercise 3
Exercise 2 due
28-Feb Indexing (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2
Specialty search engine presentations (graduate students required) Manning, Raghavan, Schutze: Ch 4
(all) Manning, Raghavan, Schutze: Ch 7.1
(graduate students required) Manning, Raghavan, Schutze: Ch 7
7-Mar Spring break
14-Mar Web search basics (all) Manning, Raghavan, Schutze: Ch 19.1-19.5
Exercise 3 due (graduate students required) Manning, Raghavan, Schutze: entire Ch 19
Exercise 4 (all) World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web
21-Mar Search engines (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/
Link analysis - Google (all) http://www.webworkshop.net/pagerank.html
Exercise 5 (graduate students required) Manning, Raghavan, Schutze: Ch 21
28-Mar XML and the semantic web (all) "Introduction to XML," Read all in XML BASIC: http://www.w3schools.com/xml/default.asp
Issues in advanced search (all) "Semantic Web Tutorial" http://infomesh.net/2001/swintro/
Google custom search presentations (all) Metadata, http://enwikipedia.org/wiki/Metadata
Exercise 4 due (all) Web2.0, http://en.wikipedia.org/wiki/Web2.0
4-Apr Review for exam
Status reports of project
Exercise 5 due
Exercise solution set available
11-Apr Exam
18-Apr Specialty search engine project presentations
Graduate student presentations - first draft
25-Apr Specialty search engine project presentations
Undergraduate presentations - final
Last day of class
30-Apr Search Engine Project Reports due