Date Topic covered Assignments
     
23-Aug Introduction to class (all) Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval
Introduction to information retrieval (all) Search Engine, http://en.wikipedia.org/wiki/Search_engine
Introduction to search engines (all) List of search engines, http://en.wikipedia.org/wiki/List_of_search_engines
How Google works? https://www.youtube.com/watch?v=0eKVizvYSUQ
Fill out student information form
30-Aug Class cancelled
6-Sep Complexity and scalability of search (Big O) (all) Scalability, http://en.wikipedia.org/wiki/Scalability
How much information (all) Big O, http://en.wikipedia.org/wiki/Big_O_notation
Exercise 1 (all) Unstructured data, http://en.wikipedia.org/wiki/Unstructured_data
(all) https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm
(all) https://www.youtube.com/watch?v=v4cd1O4zkGw
13-Sep Concept of a document  (all) Greengrass: 2.1.1-2.1.4, 2.1.6
Retrieval evaluation (all) van Rijsbergen: Ch 7, (up to Swets model) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html
Enterprise, specialty (vertical) search (graduate students required) Manning, Raghavan, Schutze: Ch 8
Teams assigned (all) https://en.wikipedia.org/wiki/Enterprise_search
20-Sep Robots.txt (all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling
Web crawling (all) Robots.txt exclusion principle, http://www.robotstxt.org/
Specialy search engines (all) Scrapy - https://doc.scrapy.org/en/latest/intro/tutorial.html
Scrapy introduction (all) Manning, Raghavan, Schutze: Ch 20.1-20.2
linux exercises (all) http://www.ee.surrey.ac.uk/Teaching/Unix/
Exercise 1 due (all) Vertical Search White Paper" Slack Barsinger, http://clgiles.ist.psu.edu/IST441/materials/papers/
Exercise 2 (all) Search Engine Technology, http://en.wikipedia.org/wiki/Search_engine_technology
27-Sep Properties of text (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1
4-Oct Classic information retrieval - vector models (all) Greengrass: 6.1-6.4
Similarity ranking (all) Manning, Raghavan, Schutze: Ch 6
Query models (all) Greengrass:  3& 4
Exercise 2 due
Exercise 3
Graduate student project presentation
11-Oct Specialty search engine updates (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2
Indexing (graduate students required) Manning, Raghavan, Schutze: Ch 4
Google Custom Search Engine (all) Manning, Raghavan, Schutze: Ch 7.1
(Programmable Search Engine) (graduate students required) Manning, Raghavan, Schutze: Ch 7
Exercise 4 https://programmablesearchengine.google.com/about/
18-Oct Web search basics (all) Manning, Raghavan, Schutze: Ch 19.1-19.5
Elasticsearch introduction (graduate students required) Manning, Raghavan, Schutze: entire Ch 19
Exercise 3 due (all) World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web
25-Oct Search engines (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/
Link analysis - Google (all) http://www.webworkshop.net/pagerank.html
Exercise 4 due (graduate students required) Manning, Raghavan, Schutze: Ch 21
Exercise 5
Specialty Google Programmable Search engine presentations
1-Nov XML and the semantic web (all) "Introduction to XML," Read all in XML BASIC: http://www.w3schools.com/xml/default.asp
Issues in advanced search (all) "Semantic Web Tutorial" http://infomesh.net/2001/swintro/
Customizing Elasticsearch (all) Metadata, http://enwikipedia.org/wiki/Metadata
Ranking by Transformers (all) Web2.0, http://en.wikipedia.org/wiki/Web2.0
(all) https://www.sbert.net/examples/applications/retrieve_rerank/README.html
8-Nov Review for exam Old exams available online
Team status reports of project
Exercise 5 due
Exercise  solution set available
15-Nov Exam
22-Nov Thanksgiving break
29-Nov Work on specialty search engines
Project and search engine updates
6-Dec Specialty search enging project presentations
9-Dec Last day of classes
Peer evaluations due
12-Dec 2 PDFs of Search Engine Project Reports due at 8 AM
Please submit a PDF via canvas and 2 hard copies of the PDF to Westgate E350