IST
441 Information Retrieval and Search Engines
Spring 2008
This course
can be counted for the IST 402 requirement.
Time and
Place: Spring, 2008. 4-7 pm Monday (pizza served during break),
208 IST Bldg.
Office
Hours: Dr. Lee Giles, IST 311A, 2-3 pm, Wednesday,
or
by request.
Yang Sun, IST 310, 3-4 pm,
Tuesday, 3-4 pm Thursday or by request.
Course
Overview:
This is a
three hour
course
for juniors, seniors and graduate students that meets once a week. The
course will
cover: organization, representation, and access to
information; categorization, indexing, and
content analysis; data
structures for unstructured data; design and
maintenance
of such databases, indexing and indexes, retrieval and classification
schemes; use of codes, formats, and standards; analysis, construction
and evaluation of search and
navigation
techniques; and search
engines and how they relate to the above.
This is an introductory course for IST students covering the practices,
issues, and theoretical foundations of organizing and analyzing
information and information content for the purpose of providing
intellectual access to textual and non-textual information resources.
This course will introduce students to the principles of information
storage and retrieval systems and databases. Students will learn how
effective information search and retrieval is interrelated with the
organization and description of information to be retrieved. Students
will also learn to use a set of tools and procedures for organizing
information, will become familiar with the techniques involved in
conducting effective searches of print and online information resources
and will build a vertical/specialty search engine.
Course
Mission Statement:
This course is intended to prepare students to design, develop and use
information systems. We will explore the practices, issues and
theoretical foundations of organizing and analyzing information and
information content for the purpose of providing intellectual access to
textual and non-textual information resources. This course will
introduce students to the principles of information storage and
retrieval systems and databases. They will learn how effective
information search and retrieval is interrelated with the organization
and description of information to be retrieved. Students will also
learn to use a set of tools and procedures for organizing information,
and will become familiar with the techniques involved in conducting
effective searches of print and online information resources. The
course also introduces the major types of information retrieval
systems, search engine, the different theoretical foundations
underlying these systems, and the methods and measures that can be used
to evaluate them.
These topics will be examined through readings, discussion, hands-on
experience using and constructing a search engine, and through
exercises designed to help explore the capabilities and utility of
different retrieval systems. The class project will consist of
building a search engine on a specific topic using open source search
engine software.
Course
Prerequisites:
IST students should have taken IST 210 and IST 240. IST 220 and
IST
230 are also useful.
Grading:
The project is a group activity. All
exercise assignments unless stated are individual
assignments.
Late Policy: All exercise
solution sets are hardcopies and are due at the start of class on the
date assigned. Starting right after the required submission date, 1/3
of the grade will be deducted for every day tardy until no grade is
available.
For more information on any of the above, please contact Lee Giles.
Texts:
No text is required. Online papers and
chapters and selections from online books will be used.
There are many other useful texts both in search and information
retrieval. A good selection can be found at the resources
of the first book.
Popular less technical books that you may find useful and are very
informative are:
John Battelle, The Search:
How Google and Its Rivals Rewrote the Rules of Business and Transformed
Our Culture, Portfolio, 2005.
Ian Witten, Marco Gori, Teresa Numerico, Web Dragons: Inside the Myths of Search
Engine Technology, Morgan Kauffman, 2006.
For those of you who want to learn more about Lucene, please see Lucene in Action by Otis
Gospodnetic and Erik Hatcher, 2005.
Schedule:
This schedule is subject to change. Please check it on a regular basis
for assignments. Some classes will have online handouts. It is the
student's responsibility to download that material.
Course
Materials and References: Course materials can be
found here. There will also be links on the schedule.
Acknowledgements!