IST 441  Information Retrieval and Search Engines
 
Spring 2008


Dr. C. Lee Giles


TA: Yang Sun


This course can be counted for the IST 402 requirement.

Time and Place: Spring, 2008. 4-7 pm Monday (pizza served during break), 208 IST Bldg.

Office Hours:  Dr. Lee Giles,  IST 311A, 2-3 pm, Wednesday, or by request.
                         Yang Sun, IST 310, 3-4 pm, Tuesday, 3-4 pm Thursday or by request.


Course Overview
:

This is a three hour course for juniors, seniors and graduate students that meets once a week. The course will cover: organization, representation, and access to information; categorization, indexing, and content analysis; data structures for unstructured data; design and maintenance of such databases, indexing and indexes, retrieval and classification schemes; use of codes, formats, and standards; analysis, construction and evaluation of search and navigation techniques; and search engines and how they relate to the above.

This is an introductory course for IST students covering the practices, issues, and theoretical foundations of organizing and analyzing information and information content for the purpose of providing intellectual access to textual and non-textual information resources. This course will introduce students to the principles of information storage and retrieval systems and databases. Students will learn how effective information search and retrieval is interrelated with the organization and description of information to be retrieved. Students will also learn to use a set of tools and procedures for organizing information, will become familiar with the techniques involved in conducting effective searches of print and online information resources and will build a vertical/specialty search engine.


Course Mission Statement:

This course is intended to prepare students to design, develop and use information systems. We will explore the practices, issues and theoretical foundations of organizing and analyzing information and information content for the purpose of providing intellectual access to textual and non-textual information resources. This course will introduce students to the principles of information storage and retrieval systems and databases. They will learn how effective information search and retrieval is interrelated with the organization and description of information to be retrieved. Students will also learn to use a set of tools and procedures for organizing information, and will become familiar with the techniques involved in conducting effective searches of print and online information resources. The course also introduces the major types of information retrieval systems, search engine, the different theoretical foundations underlying these systems, and the methods and measures that can be used to evaluate them.

These topics will be examined through readings, discussion, hands-on experience using and constructing a search engine, and through exercises designed to help explore the capabilities and utility of different retrieval systems.  The class project will consist of building a search engine on a specific topic using open source search engine software.


Course Prerequisites:

IST students should have taken IST 210 and IST 240.  IST 220 and IST 230 are also useful.


Grading:      

Exam
 30 points
Search Project & Report
 35 points
Exercises
 25 points
Class Participation
 10 points

The project is a group activity. All exercise assignments unless stated are individual assignments.

Late Policy: All exercise solution sets are hardcopies and are due at the start of class on the date assigned. Starting right after the required submission date, 1/3 of the grade will be deducted for every day tardy until no grade is available.

For more information on any of the above, please contact Lee Giles.


Texts:  No text is required. Online papers and chapters and selections from online books will be used.

We will use chapters from Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Cambridge University Press. 2008 and selections from Information Retrieval: A Survey by Ed Greengrass, 2000, and the classic Information Retrieval by C. J. van Rijsbergen. Butterworths, 1979.

There are many other useful texts both in search and information retrieval. A good selection can be found at the resources of the first book.

Popular less technical books that you may find useful and are very informative are:

John Battelle, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, Portfolio, 2005.
  
Ian Witten, Marco Gori, Teresa Numerico, Web Dragons: Inside the Myths of Search Engine Technology, Morgan Kauffman, 2006.

For those of you who want to learn more about Lucene, please see Lucene in Action by Otis Gospodnetic and Erik Hatcher, 2005.



Schedule:
  This schedule is subject to change. Please check it on a regular basis for assignments. Some classes will have online handouts. It is the student's responsibility to download that material.


Course Materials and References: Course materials can be found here. There will also be links on the schedule.


Acknowledgements!