Date Topic covered Assignments
     
9-Jan Introduction to class (all) Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval
Introduction to information retrieval (all) Search Engine, http://en.wikipedia.org/wiki/Search_engine
Introduction to search engines (all) List of search engines, http://en.wikipedia.org/wiki/List_of_search_engines
16-Jan Complexity and scalability of search (Big O) (all) Scalability, http://en.wikipedia.org/wiki/Scalability
How much information (all) Big O, http://en.wikipedia.org/wiki/Big_O_notation
Exercise 1 (all) Unstructured data, http://en.wikipedia.org/wiki/Unstructured_data
Teams assigned (all) https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
23-Jan Concept of a document  (all) Greengrass: 2.1.1-2.1.4, 2.1.6
Retrieval evaluation (all) van Rijsbergen: Ch 7, (up to Swets model) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
Enterprise, specialty (vertical) search (graduate students required) Manning, Raghavan, Schutze: Ch 8
(all) https://en.wikipedia.org/wiki/Enterprise_search
30-Jan Robots.txt (all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling
Web crawling (all) Robots.txt exclusion principle, http://www.robotstxt.org/
Specialy search engines (all) Scrapy - https://doc.scrapy.org/en/latest/intro/tutorial.html
Scrapy introduction (all) Manning, Raghavan, Schutze: Ch 20.1-20.2
linux exercises (all) http://www.ee.surrey.ac.uk/Teaching/Unix/
Exercise 1 due (all) Vertical Search White Paper" Slack Barsinger, http://clgiles.ist.psu.edu/IST441/materials/papers/
Exercise 2 (all) Search Engine Technology, http://en.wikipedia.org/wiki/Search_engine_technology
6-Feb Properties of text (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1
Team updates
13-Feb Classic information retrieval - vector models (all) Greengrass: 6.1-6.4
Similarity ranking (all) Manning, Raghavan, Schutze: Ch 6
Query models (all) Greengrass:  3& 4
Exercise 2 due
Exercise 3
20-Feb Class canceled
27-Feb Specialty search engine proposal presentations
Indexing (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2
Exercise 3 due (graduate students required) Manning, Raghavan, Schutze: Ch 4
Exercise 4 (all) Manning, Raghavan, Schutze: Ch 7.1
Elasticsearch introduction (graduate students required) Manning, Raghavan, Schutze: Ch 7
6-Mar Spring break
13-Mar Web search basics (all) Manning, Raghavan, Schutze: Ch 19.1-19.5
How to build a Google Custom Search Engine https://developer.google.com/custom-search/
Elasticsearch introduction (graduate students required) Manning, Raghavan, Schutze: entire Ch 19
(all) World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web
20-Mar Search engines (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/
Link analysis - Google (all) http://www.webworkshop.net/pagerank.html
Exercise 4 due (graduate students required) Manning, Raghavan, Schutze: Ch 21
Exercise 5
Specialty Google Custom Search engine presentations
Customizeing Elasticsearch
27-Mar XML and the semantic web (all) "Introduction to XML," Read all in XML BASIC: http://www.w3schools.com/xml/default.asp
Issues in advanced search (all) "Semantic Web Tutorial" http://infomesh.net/2001/swintro/
(all) Metadata, http://enwikipedia.org/wiki/Metadata
(all) Web2.0, http://en.wikipedia.org/wiki/Web2.0
3-Apr Review for exam
Team status reports of project
Exercise 5 due
Exercise  solution set available
10-Apr Exam
17-Apr Work on specialty search engines
Graduate student presentations - first draft
24-Apr Specialty search engine project presentations
26-Apr Last day of class
29-Apr Search Engine Project Reports due at 8 am