Date Topic covered Assignments
9-Jan Introduction to class (all) Information Retrieval,
Introduction to information retrieval (all) Search Engine,
Introduction to search engines (all) List of search engines,
16-Jan Complexity and scalability of search (Big O) (all) Scalability,
How much information (all) Big O,
Exercise 1 (all) Unstructured data,
Teams assigned (all)
23-Jan Concept of a document  (all) Greengrass: 2.1.1-2.1.4, 2.1.6
Retrieval evaluation (all) van Rijsbergen: Ch 7, (up to Swets model)
Enterprise, specialty (vertical) search (graduate students required) Manning, Raghavan, Schutze: Ch 8
30-Jan Robots.txt (all) Web Crawler,
Web crawling (all) Robots.txt exclusion principle,
Specialy search engines (all) Scrapy -
Scrapy introduction (all) Manning, Raghavan, Schutze: Ch 20.1-20.2
linux exercises (all)
Exercise 1 due (all) Vertical Search White Paper" Slack Barsinger,
Exercise 2 (all) Search Engine Technology,
6-Feb Properties of text (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1
Team updates
13-Feb Classic information retrieval - vector models (all) Greengrass: 6.1-6.4
Similarity ranking (all) Manning, Raghavan, Schutze: Ch 6
Query models (all) Greengrass:  3& 4
Exercise 2 due
Exercise 3
20-Feb Class canceled
27-Feb Specialty search engine proposal presentations
Indexing (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2
Exercise 3 due (graduate students required) Manning, Raghavan, Schutze: Ch 4
Exercise 4 (all) Manning, Raghavan, Schutze: Ch 7.1
Elasticsearch introduction (graduate students required) Manning, Raghavan, Schutze: Ch 7
6-Mar Spring break
13-Mar Web search basics (all) Manning, Raghavan, Schutze: Ch 19.1-19.5
How to build a Google Custom Search Engine
Elasticsearch introduction (graduate students required) Manning, Raghavan, Schutze: entire Ch 19
(all) World Wide Web
20-Mar Search engines (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page,
Link analysis - Google (all)
Exercise 4 due (graduate students required) Manning, Raghavan, Schutze: Ch 21
Exercise 5
Specialty Google Custom Search engine presentations
Customizeing Elasticsearch
27-Mar XML and the semantic web (all) "Introduction to XML," Read all in XML BASIC:
Issues in advanced search (all) "Semantic Web Tutorial"
(all) Metadata,
(all) Web2.0,
3-Apr Review for exam
Team status reports of project
Exercise 5 due
Exercise  solution set available
10-Apr Exam
17-Apr Work on specialty search engines
Graduate student presentations - first draft
24-Apr Specialty search engine project presentations
26-Apr Last day of class
29-Apr Search Engine Project Reports due at 8 am