Date Topic covered Assignments
     
9-Jan Class syllabus
Introduction to information retrieval (all) Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval
How much Information - 2013 (all) Unstructured data, http://en.wikipedia.org/wiki/Unstructured_data
Amount of information (all) https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
(all) Search Engine, http://en.wikipedia.org/wiki/Search_engine
(all) List of search engines, http://en.wikipedia.org/wiki/List_of_search_engines
16-Jan No classes
23-Jan Concept of a document  (all) Greengrass: 2.1.1-2.1.4, 2.1.6
Retrieval evaluation (all) van Rijsbergen: Ch 7, (up to Swets model) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
Exercise 1 (graduate students required) Manning, Raghavan, Schutze: Ch 8
30-Jan Properties of text (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1
Query models (all) Greengrass: 3 & 4
6-Feb Robots.txt (all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling
Web crawling (all) Robots.txt exclusion principle, http://www.robotstxt.org/
Specialy search engines (all) Heritrix, http://en.wikipedia.org/wiki/Heritrix
Exercise 1 due (all) Manning, Raghavan, Schutze: Ch 20.1-20.2
(all) Vertical Search White Paper" Slack Barsinger, http://clgiles.ist.psu.edu/IST441/materials/papers/
(all) Search Engine Technology, http://en.wikipedia.org/wiki/Search_engine_technology
13-Feb Classic information retrieval - vector models (all) Greengrass: 6.1-6.4
Similarity ranking (all) Manning, Raghavan, Schutze: Ch 6
Exercise 2
20-Feb Indexing (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2
(graduate students required) Manning, Raghavan, Schutze: Ch 4
(all) Manning, Raghavan, Schutze: Ch 7.1
(graduate students required) Manning, Raghavan, Schutze: Ch 7
27-Feb Web search basics (all) Manning, Raghavan, Schutze: Ch 19.1-19.5
Exercise 2 due (graduate students required) Manning, Raghavan, Schutze: entire Ch 19
Exercise 3 (all) World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web
Specialty search engine proposal presentations
6-Mar Spring break
13-Mar Search engines (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/
Link analysis - Google (all) http://www.webworkshop.net/pagerank.html
Specialty search engine proposal presentations - all others (graduate students required) Manning, Raghavan, Schutze: Ch 21
Exercise 4
20-Mar XML and the semantic web (all) "Introduction to XML," Read all in XML BASIC: http://www.w3schools.com/xml/default.asp
Issues in advanced search (all) "Semantic Web Tutorial" http://infomesh.net/2001/swintro/
Google custom search presentations (all) Metadata, http://enwikipedia.org/wiki/Metadata
Exercise 3 due (all) Web2.0, http://en.wikipedia.org/wiki/Web2.0
27-Mar Recommender systems (all) http://en.wikipedia.org/wiki/Recommender_system
Complexity and scalability of search (all) http://en.wikipedia.org/wiki/Collaborative_filtering
Exercise 4 due (graduate students and extra credit) Scalability, http://en.wikipedia.org/wiki/Scalability
Exercise 5 (graduate students and extra credit) Big O, http://en.wikipedia.org/wiki/Big_O_notation
3-Apr Review for exam
Status reports of project
10-Apr Exam
Exercise 5 due
17-Apr Specialty search projects catchup and review
Team order of presentations assigned
24-Apr Specialty search engine presentations
Last day of class
1-May Search Engine Project Reports due
Solr/Lucene https://drive.google.com/drive/folders/0B65bQnJhC1mvZmI4cnNqaGY1RUU