Date |
Topic covered |
Assignments |
|
|
|
23-Aug |
Introduction to class |
(all) Information Retrieval,
http://en.wikipedia.org/wiki/Information_retrieval |
|
Introduction to information retrieval |
(all) Search Engine,
http://en.wikipedia.org/wiki/Search_engine |
|
Introduction to search engines |
(all) List of search engines,
http://en.wikipedia.org/wiki/List_of_search_engines |
|
How Google works? |
https://www.youtube.com/watch?v=0eKVizvYSUQ |
|
Fill out student information form |
|
|
|
|
30-Aug |
Class cancelled |
|
|
|
|
6-Sep |
Complexity and scalability of search (Big O) |
(all) Scalability, http://en.wikipedia.org/wiki/Scalability |
|
How much information |
(all) Big O, http://en.wikipedia.org/wiki/Big_O_notation |
|
Exercise 1 |
(all) Unstructured data,
http://en.wikipedia.org/wiki/Unstructured_data |
|
|
(all)
https://www.emc.com/leadership/digital-universe/2014iview/executive-summary.htm |
|
|
(all) https://www.youtube.com/watch?v=v4cd1O4zkGw |
|
|
|
13-Sep |
Concept of a document |
(all) Greengrass: 2.1.1-2.1.4, 2.1.6 |
|
Retrieval evaluation |
(all) van Rijsbergen: Ch 7, (up to Swets model)
http://www.dcs.gla.ac.uk/Keith/Chapter.7/Ch.7.html |
|
Enterprise, specialty (vertical) search |
(graduate students required) Manning, Raghavan, Schutze: Ch 8 |
|
Teams assigned |
(all) https://en.wikipedia.org/wiki/Enterprise_search |
|
|
|
20-Sep |
Robots.txt |
(all) Web Crawler, http://en.wikipedia.org/wiki/Web_crawling |
|
Web crawling |
(all) Robots.txt exclusion principle,
http://www.robotstxt.org/ |
|
Specialy search engines |
(all) Scrapy -
https://doc.scrapy.org/en/latest/intro/tutorial.html |
|
Scrapy introduction |
(all) Manning, Raghavan, Schutze: Ch 20.1-20.2 |
|
linux exercises |
(all) http://www.ee.surrey.ac.uk/Teaching/Unix/ |
|
Exercise 1 due |
(all) Vertical Search White Paper" Slack Barsinger,
http://clgiles.ist.psu.edu/IST441/materials/papers/ |
|
Exercise 2 |
(all) Search Engine Technology,
http://en.wikipedia.org/wiki/Search_engine_technology |
|
|
|
27-Sep |
Properties of text |
(all) Manning, Raghavan, Schutze: Ch1, Ch 2.1-2.2, Ch 5.1 |
|
|
|
4-Oct |
Classic information retrieval - vector models |
(all) Greengrass: 6.1-6.4 |
|
Similarity ranking |
(all) Manning, Raghavan, Schutze: Ch 6 |
|
Query models |
(all) Greengrass:
3& 4 |
|
Exercise 2 due |
|
|
Exercise 3 |
|
|
Graduate student project presentation |
|
|
|
|
11-Oct |
Specialty search engine updates |
(all) Manning, Raghavan, Schutze: Ch 1, Ch 4.1-4.2 |
|
Indexing |
(graduate students required) Manning, Raghavan, Schutze: Ch 4 |
|
Google Custom Search Engine |
(all) Manning, Raghavan, Schutze: Ch 7.1 |
|
(Programmable Search Engine) |
(graduate students required) Manning, Raghavan, Schutze: Ch 7 |
|
Exercise 4 |
https://programmablesearchengine.google.com/about/ |
|
|
|
18-Oct |
Web search basics |
(all) Manning, Raghavan, Schutze: Ch 19.1-19.5 |
|
Elasticsearch introduction |
(graduate students required) Manning, Raghavan, Schutze:
entire Ch 19 |
|
Exercise 3 due |
(all) World Wide Web
http://en.wikipedia.org/wiki/World_Wide_Web |
|
|
|
25-Oct |
Search engines |
(all) The Anatomy of a Large-Scale Hypertextual Web Search
Engine, S. Brin, L. Page, http://clgiles.ist.psu.edu/IST441/materials/papers/ |
|
Link analysis - Google |
(all) http://www.webworkshop.net/pagerank.html |
|
Exercise 4 due |
(graduate students required) Manning, Raghavan, Schutze: Ch 21 |
|
Exercise 5 |
|
|
Specialty Google Programmable Search engine presentations |
|
|
|
|
1-Nov |
XML and the semantic web |
(all) "Introduction to XML," Read all in XML BASIC:
http://www.w3schools.com/xml/default.asp |
|
Issues in advanced search |
(all) "Semantic Web Tutorial"
http://infomesh.net/2001/swintro/ |
|
Customizing Elasticsearch |
(all) Metadata, http://enwikipedia.org/wiki/Metadata |
|
Ranking by Transformers |
(all) Web2.0, http://en.wikipedia.org/wiki/Web2.0 |
|
|
(all)
https://www.sbert.net/examples/applications/retrieve_rerank/README.html |
|
|
|
8-Nov |
Review for exam |
Old exams available online |
|
Team status reports of project |
|
|
Exercise 5 due |
|
|
Exercise solution set
available |
|
|
|
|
15-Nov |
Exam |
|
|
|
|
22-Nov |
Thanksgiving break |
|
|
|
|
29-Nov |
Work on specialty search engines |
|
|
Project and search engine updates |
|
|
|
|
6-Dec |
Specialty search enging project presentations |
|
|
|
|
9-Dec |
Last day of classes |
|
|
Peer evaluations due |
|
|
|
|
12-Dec |
2 PDFs of Search Engine Project Reports due at 8 AM |
|
|
Please submit a PDF via
canvas and 2 hard copies of the PDF to Westgate E350 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|