IST 441  Information Retrieval and Search Engines
 
Spring 2014


Instructor: Dr. C. Lee Giles

TA: Sagnik Ray Choudhury


This course can be counted for the IST 402 requirement.

Time and Place: Spring, 2014. 3:35-6:35, Monday, 205 IST Bldg.

Office Hours:  Dr. Lee Giles,  IST 311A, 1-2 PM Tuesday or TBA
                        TA: Sagnik Choudhury, IST310, 1-2 PM, Wednesday, 3-5PM Thursday, or TBA


Course Overview
:

This is a three hour course for juniors, seniors and graduate students that meets once a week. The course will cover: organization, representation, and access to information; categorization, indexing, and content analysis; data structures for unstructured data; design and maintenance of such databases, indexing and indexes, retrieval and classification schemes; use of codes, formats, and standards; analysis, construction and evaluation of search and navigation techniques; and search engines and how they relate to the above.

Course Mission Statement:

This course is intended to prepare students to understand, design, develop and use information retrieval and search systems.

Course Prerequisites:


IST students should have taken IST 210 and IST 240.  IST 220 and IST 230 are also useful. Other students should consult with the instructor.

Schedule (syllabus):  This schedule is subject to change. Please check it on a regular basis for assignments. The reading list is here; most classes will have online handouts. It is the student's responsibility to download that material.

Course Materials and References: Course materials can be found here. There will also be links on the schedule.


Grading:
      

Search Project & Report
 40 points
Exam  30 points
Exercises
 25 points
Class Participation
 5 points

The project is a group activity unless approved by the instructor. All exercise assignments unless stated are individual assignments.

Late Policy: All exercise solution sets are hardcopies and are due at midnight on the date assigned. Starting right after the required submission date, 1/3 of the grade will be deducted for every day tardy until no grade is available.

For more information on any of the above, please contact Lee Giles.


Texts and Readings:  No text is required. Online papers and chapters and selections from online books will be used.

The reading list is on the schedule below.  We will use chapters from Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze, Cambridge University Press. 2008 and selections from Information Retrieval: A Survey by Ed Greengrass, 2000, and the classic Information Retrieval by C. J. van Rijsbergen. Butterworths, 1979.

There are many other useful texts both on search and information retrieval. A good selection, but a bit outdated, can be found at the resources section of the first book.

Popular but less technical books that you may find useful and are very informative are:

John Battelle, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, Portfolio, 2005.
  
Ian Witten, Marco Gori, Teresa Numerico, Web Dragons: Inside the Myths of Search Engine Technology, Morgan Kauffman, 2006.

We will be using the popular open source enterprise search platform, Solr/Lucene, which is based on the even more popular Lucene indexer.



Email: All email to the instructor and TA about this class should contain "IST441" in the subject line.  For example, the subject line might read "IST441: Question about ....".  Email without this information might be deleted by spam filters or placed in a folder to be read at a later date.  Email with the appropriate identifier will usually be read within 24 hours of being received.


Acknowledgements!


Reuse: Materials from this course can be publicly reused in other courses.