| Date | Topic covered | Assignments |
| 23-Aug | Introduction to class | (all) Information Retrieval, http://en.wikipedia.org/wiki/Information_retrieval |
| Introduction to information retrieval | (all) Search Engine, http://en.wikipedia.org/wiki/Search_engine | |
| Introduction to search engines | (all) List of search engines, http://en.wikipedia.org/wiki/List_of_search_engines | |
| How Google works? | https://www.youtube.com/watch?v=0eKVizvYSUQ | |
| Fill out student information form | ||
| 30-Aug | Class cancelled | |
| 6-Sep | Complexity and scalability of search (Big O) | (all) Scalability, http://en.wikipedia.org/wiki/Scalability |
| How much information | (all) Big O, http://en.wikipedia.org/wiki/Big_O_notation | |
| Exercise 1 | (all) Unstructured data, http://en.wikipedia.org/wiki/Unstructured_data | |
| (all) https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm | ||
| (all) https://www.youtube.com/watch?v=v4cd1O4zkGw | ||
| 13-Sep | Concept of a document | (all) Greengrass: 2.1.1-2.1.4, 2.1.6 |
| Retrieval evaluation | (all) van Rijsbergen: Ch 7, (up to Swets model) http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html | |
| Enterprise, specialty (vertical) search | (graduate students required) Manning, Raghavan, Schutze: Ch 8 | |
| Teams assigned | (all) https://en.wikipedia.org/wiki/Enterprise_search | |
| 20-Sep | Robots.txt | (all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling |
| Web crawling | (all) Robots.txt exclusion principle, http://www.robotstxt.org/ | |
| Specialy search engines | (all) Scrapy - https://doc.scrapy.org/en/latest/intro/tutorial.html | |
| Scrapy introduction | (all) Manning, Raghavan, Schutze: Ch 20.1-20.2 | |
| linux exercises | (all) http://www.ee.surrey.ac.uk/Teaching/Unix/ | |
| Exercise 1 due | (all) Vertical Search White Paper" Slack Barsinger, http://clgiles.ist.psu.edu/IST441/materials/papers/ | |
| Exercise 2 | (all) Search Engine Technology, http://en.wikipedia.org/wiki/Search_engine_technology | |
| 27-Sep | Properties of text | (all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1 |
| 4-Oct | Classic information retrieval - vector models | (all) Greengrass: 6.1-6.4 |
| Similarity ranking | (all) Manning, Raghavan, Schutze: Ch 6 | |
| Query models | (all) Greengrass: 3& 4 | |
| Exercise 2 due | ||
| Exercise 3 | ||
| Graduate student project presentation | ||
| 11-Oct | Specialty search engine updates | (all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2 |
| Indexing | (graduate students required) Manning, Raghavan, Schutze: Ch 4 | |
| Google Custom Search Engine | (all) Manning, Raghavan, Schutze: Ch 7.1 | |
| (Programmable Search Engine) | (graduate students required) Manning, Raghavan, Schutze: Ch 7 | |
| Exercise 4 | https://programmablesearchengine.google.com/about/ | |
| 18-Oct | Web search basics | (all) Manning, Raghavan, Schutze: Ch 19.1-19.5 |
| Elasticsearch introduction | (graduate students required) Manning, Raghavan, Schutze: entire Ch 19 | |
| Exercise 3 due | (all) World Wide Web http://en.wikipedia.org/wiki/World_Wide_Web | |
| 25-Oct | Search engines | (all) The Anatomy of a Large-Scale Hypertextual Web Search Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/ |
| Link analysis - Google | (all) http://www.webworkshop.net/pagerank.html | |
| Exercise 4 due | (graduate students required) Manning, Raghavan, Schutze: Ch 21 | |
| Exercise 5 | ||
| Specialty Google Programmable Search engine presentations | ||
| 1-Nov | XML and the semantic web | (all) "Introduction to XML," Read all in XML BASIC: http://www.w3schools.com/xml/default.asp |
| Issues in advanced search | (all) "Semantic Web Tutorial" http://infomesh.net/2001/swintro/ | |
| Customizing Elasticsearch | (all) Metadata, http://enwikipedia.org/wiki/Metadata | |
| Ranking by Transformers | (all) Web2.0, http://en.wikipedia.org/wiki/Web2.0 | |
| (all) https://www.sbert.net/examples/applications/retrieve_rerank/README.html | ||
| 8-Nov | Review for exam | Old exams available online |
| Team status reports of project | ||
| Exercise 5 due | ||
| Exercise solution set available | ||
| 15-Nov | Exam | |
| 22-Nov | Thanksgiving break | |
| 29-Nov | Work on specialty search engines | |
| Project and search engine updates | ||
| 6-Dec | Specialty search enging project presentations | |
| 9-Dec | Last day of classes | |
| Peer evaluations due | ||
| 12-Dec | 2 PDFs of Search Engine Project Reports due at 8 AM | |
| Please submit a PDF via canvas and 2 hard copies of the PDF to Westgate E350 | ||