IST
441 – Information Retrieval and Search Engines
Fall 2006
This course
can be counted for the IST 402 requirement.
Time and
Place: Fall, 2006. 4-7 pm Monday (pizza served during break),
210 IST Bldg.
Office
Hours: Dr. Lee Giles, IST 311A from 3 to 4 PM
Tuesday and Wednesday. Also, upon request.
Bi Chen, IST 313D, no 14, from 3
to 4 PM Thursday and upon request.
Course
Overview:
This is a
three hour
course
for juniors, seniors and graduate students that meets once a week. The
course will
cover: Organization, representation, and access to
information. Categorization, indexing, and
content analysis. Data
structures for unstructured data. Design and
maintenance
of such databases, indexing and indexes, retrieval and classification
schemes. Use of codes, formats, and standards. Analysis, construction
and evaluation of search and
navigation
techniques. Search
engines and how they relate to the above.
This is an introductory course for IST students covering the practices,
issues, and theoretical foundations of organizing and analyzing
information and information content for the purpose of providing
intellectual access to textual and non-textual information resources.
This course will introduce students to the principles of information
storage and retrieval systems and databases. Students will learn how
effective information search and retrieval is interrelated with the
organization and description of information to be retrieved. Students
will also learn to use a set of tools and procedures for organizing
information, will become familiar with the techniques involved in
conducting effective searches of print and online information resources
and will build a search engine.
Course
Mission Statement:
This course is intended to prepare students to design, develop and use
information systems. We will explore the practices, issues and
theoretical foundations of organizing and analyzing information and
information content for the purpose of providing intellectual access to
textual and non-textual information resources. This course will
introduce students to the principles of information storage and
retrieval systems and databases. They will learn how effective
information search and retrieval is interrelated with the organization
and description of information to be retrieved. Students will also
learn to use a set of tools and procedures for organizing information,
and will become familiar with the techniques involved in conducting
effective searches of print and online information resources. The
course also introduces the major types of information retrieval
systems, search engine, the different theoretical foundations
underlying these systems, and the methods and measures that can be used
to evaluate& them.
These topics will be examined through readings, discussion, hands-on
experience using and constructing a search engine, and through
exercises designed to help explore the capabilities and utility of
different retrieval systems. The class project will consist of
building a search engine on a specific topic using open source search
engine software.
Course
Prerequisites:
IST students must have taken IST 210 and IST 240. IST 220 and IST
230 are also useful.
Grading:
The project and research presentation is a group activity. All
exercise assignments unless stated are for individuals.
Late Policy: All exercise
solution sets are hardcopies and are due at the start of class on the
date assigned. Starting right after the required submission date, 1/3
of the grade will be deducted for every day tardy until no grade is
available.
For more information on any of the above, please contact Lee Giles.
Texts:
Required: John Battelle, The Search:
How Google and Its Rivals Rewrote the Rules of Business and Transformed
Our Culture, Portfolio, 2005.
(Recommended for undergrads) Robert R. Korfhage, Information Storage
and Retrieval, Wiley, 1997. Material may be drawn from other
texts and papers.
(Recommended for Graduate students) Ricardo A. Baeza-Yates, Berthier
Ribeiro-Neto, Modern Information Retrieval, ACM Press, 1999.
Other useful texts:
Brand
new! C.D. Manning, P. Raghavan, H. Schütze,
Introduction
to Information Retrieval, Cambridge UK, 2007. (downloadable!!!)
C. J. van Rijsbergen,
Information
Retrieval, Butterworths, 1975. (still appropriate in some ways and
downloadable!!)
Richard K. Belew, Finding Out About: A Cognitive Perspective on Search
Engine Technology and WWW, Cambridge, 2000.
David A. Grossman, Ophir Frieder, Information Retrieval: Algorithms and
Heuristics, Springer, 2004.
Schedule:
This schedule is subject to change. Please check it on a regular basis
for assignments. Some classes will have online handouts. It is the
student’s responsibility to download that material.
Course
Materials and References: Here is a link to some of
our course materials. There could also be links under the schedule.
Acknowledgements!