IST 441 – Information Retrieval and Search Engines
Fall 2006


Dr. C. Lee Giles


TA: Bi Chen and Jian Huang


This course can be counted for the IST 402 requirement.

Time and Place: Fall, 2006. 4-7 pm Monday (pizza served during break), 210 IST Bldg.

Office Hours:  Dr. Lee Giles,  IST 311A  from 3 to 4 PM Tuesday and Wednesday. Also, upon request.

                         Bi Chen, IST 313D, no 14, from 3 to 4 PM Thursday and upon request.


Course Overview
:

This is a three hour course for juniors, seniors and graduate students that meets once a week. The course will cover: Organization, representation, and access to information. Categorization, indexing, and content analysis. Data structures for unstructured data. Design and maintenance of such databases, indexing and indexes, retrieval and classification schemes. Use of codes, formats, and standards. Analysis, construction and evaluation of search and navigation techniques. Search engines and how they relate to the above.

This is an introductory course for IST students covering the practices, issues, and theoretical foundations of organizing and analyzing information and information content for the purpose of providing intellectual access to textual and non-textual information resources. This course will introduce students to the principles of information storage and retrieval systems and databases. Students will learn how effective information search and retrieval is interrelated with the organization and description of information to be retrieved. Students will also learn to use a set of tools and procedures for organizing information, will become familiar with the techniques involved in conducting effective searches of print and online information resources and will build a search engine.


Course Mission Statement:

This course is intended to prepare students to design, develop and use information systems. We will explore the practices, issues and theoretical foundations of organizing and analyzing information and information content for the purpose of providing intellectual access to textual and non-textual information resources. This course will introduce students to the principles of information storage and retrieval systems and databases. They will learn how effective information search and retrieval is interrelated with the organization and description of information to be retrieved. Students will also learn to use a set of tools and procedures for organizing information, and will become familiar with the techniques involved in conducting effective searches of print and online information resources. The course also introduces the major types of information retrieval systems, search engine, the different theoretical foundations underlying these systems, and the methods and measures that can be used to evaluate& them.

These topics will be examined through readings, discussion, hands-on experience using and constructing a search engine, and through exercises designed to help explore the capabilities and utility of different retrieval systems.  The class project will consist of building a search engine on a specific topic using open source search engine software.


Course Prerequisites:

IST students must have taken IST 210 and IST 240.  IST 220 and IST 230 are also useful.


Grading:      

Exam
30 points
Project & Report
30 points
Research Presentation
25 points
Exercises
15 points

The project and research presentation is a group activity. All exercise assignments unless stated are for individuals.

Late Policy: All exercise solution sets are hardcopies and are due at the start of class on the date assigned. Starting right after the required submission date, 1/3 of the grade will be deducted for every day tardy until no grade is available.

For more information on any of the above, please contact Lee Giles.


Texts:

Required: John Battelle, The Search: How Google and Its Rivals Rewrote the Rules of Business and Transformed Our Culture, Portfolio, 2005.

(Recommended for undergrads) Robert R. Korfhage, Information Storage and Retrieval, Wiley, 1997.  Material may be drawn from other texts and papers.

(Recommended for Graduate students) Ricardo A. Baeza-Yates, Berthier Ribeiro-Neto, Modern Information Retrieval, ACM Press, 1999.


           Other useful texts:

Brand new! C.D. Manning, P. Raghavan, H. Schütze, Introduction to Information Retrieval, Cambridge UK, 2007. (downloadable!!!)

C. J. van Rijsbergen, Information Retrieval, Butterworths, 1975. (still appropriate in some ways and downloadable!!)

Richard K. Belew, Finding Out About: A Cognitive Perspective on Search Engine Technology and WWW, Cambridge, 2000. 

David A. Grossman, Ophir Frieder, Information Retrieval: Algorithms and Heuristics, Springer, 2004.


Schedule:
  This schedule is subject to change. Please check it on a regular basis for assignments. Some classes will have online handouts. It is the student’s responsibility to download that material.


Course Materials and References: Here is a link to some of our course materials. There could also be links under the schedule.


Acknowledgements!