IST
441 Information Retrieval and Search Engines
Spring 2010
This course
can be counted for the IST 402 requirement.
Time and
Place: Spring, 2010. 3:35-6:35, Mondays, 205 IST Bldg.
Office
Hours: Dr. Lee Giles, IST 311A, 2-3 pm, Tuesday or
TBA.
TA: Wednesday 5-6 pm, Thursday 5-6 pm
or TBA.
Course
Overview:
This is a
three hour
course
for juniors, seniors and graduate students that meets once a week. The
course will
cover: organization, representation, and access to
information; categorization, indexing, and
content analysis; data
structures for unstructured data; design and
maintenance
of such databases, indexing and indexes, retrieval and classification
schemes; use of codes, formats, and standards; analysis, construction
and evaluation of search and
navigation
techniques; and search
engines and how they relate to the above.
This is an introductory course for IST students covering the practices,
issues, and theoretical foundations of organizing and analyzing
information and information content for the purpose of providing
intellectual access to textual and non-textual information resources.
This course will introduce students to the principles of information
storage and retrieval systems and databases. Students will learn how
effective information search and retrieval is interrelated with the
organization and description of information to be retrieved. Students
will also learn to use a set of tools and procedures for organizing
information, will become familiar with the techniques involved in
conducting effective searches of print and online information resources
and will build a vertical/specialty search engine.
Course
Mission Statement:
This course is intended to prepare students to design, develop and use
information retrieval and search systems. We will explore the
practices, issues and
theoretical foundations of organizing and analyzing information and
information content for the purpose of providing intellectual access to
textual and non-textual information resources. This course will
introduce students to the principles of information storage and
retrieval systems and databases. They will learn how effective
information search and retrieval is interrelated with the organization
and description of information to be retrieved. Students will also
learn to use a set of tools and procedures for organizing information,
and will become familiar with the techniques involved in conducting
effective searches of print and online information resources. The
course also introduces the major types of information retrieval
systems, search engine, the different theoretical foundations
underlying these systems, and the methods and measures that can be used
to evaluate them.
These topics will be examined through readings, discussion, hands-on
experience using and constructing a search engine, and through
exercises designed to help explore the capabilities and utility of
different retrieval systems. The class project will consist of
building a search engine on a specific topic using open source search
engine software.
Course
Prerequisites:
IST students should have taken IST 210 and IST 240. IST 220 and
IST
230 are also useful. Other students should consult with the instructor.
Grading:
The project is a group activity unless
approved by the instructor. All
exercise assignments unless stated are individual
assignments.
Late Policy:
All exercise
solution sets are hardcopies
and are due at midnight on the
date assigned. Starting right after the required submission date, 1/3
of the grade will be deducted for every day tardy until no grade is
available.
For more information on any of the above, please contact Lee Giles.
Texts
and Readings:
No text is required. Online papers and
chapters and selections from online books will be used.
There are many other useful texts both
in search and information
retrieval. A good selection can be found at the
resources
of the first book.
Popular but less technical books that
you may find useful and are very
informative:
John Battelle, The Search:
How Google and Its Rivals Rewrote the Rules of Business and Transformed
Our Culture, Portfolio, 2005.
Ian Witten, Marco Gori, Teresa Numerico, Web Dragons: Inside the Myths of Search
Engine Technology, Morgan Kauffman, 2006.
For those of you who want to learn more
about Lucene, please see
Lucene in Action by Otis
Gospodnetic and Erik Hatcher, 2005.
Email:
All email to the instructor and TA about this class should contain
"IST441" in the subject line. For example, the subject line might
read "IST441: Question about ....". Email without this
information might be deleted by spam filters or placed in a folder to
be read at a later date. Email with the appropriate identifier
will usually be read within 24 hours of being received.
Schedule:
This schedule is subject to change. Please check it on a regular basis
for assignments. Some classes will have online handouts. It is the
student's responsibility to download that material.
Course
Materials and References: Course materials can be
found here. There will also be links on the schedule.
Acknowledgements!