441 Information Retrieval and Search Engines
Spring 2014 will be similar with appropriate changes
can be counted for the IST 402 requirement.
Place: Spring, 2013. 3:35-6:35, Monday, 205 IST Bldg.
Hours: Dr. Lee Giles, IST 311A, 4-5 PM Tuesday or TBA
TA: Steve Carman, IST 310, 4-5 PM,
Wednesday, Thursday or TBA
This is a
for juniors, seniors and graduate students that meets once a week. The
cover: organization, representation, and access to
information; categorization, indexing, and
content analysis; data
structures for unstructured data; design and
of such databases, indexing and indexes, retrieval and classification
schemes; use of codes, formats, and standards; analysis, construction
and evaluation of search and
techniques; and search
engines and how they relate to the above.
This course is intended to prepare students to understand, design,
develop and use
information retrieval and search systems.
IST students should have taken IST 210 and IST 240. IST 220 and
230 are also useful. Other students should consult with the instructor.
schedule is subject to change. Please check it on a regular
for assignments. The reading list is here; most classes will have
online handouts. It is the
student's responsibility to download that material.
Materials and References: Course materials can be
found here. There will also be links on the schedule.
The project is a group activity unless
approved by the instructor. All
exercise assignments unless stated are individual
solution sets are hardcopies
and are due at midnight on the
date assigned. Starting right after the required submission date, 1/3
of the grade will be deducted for every day tardy until no grade is
For more information on any of the above, please contact Lee Giles.
No text is required. Online papers and
chapters and selections from online books will be used.
There are many other useful texts both
on search and information
retrieval. A good selection, but a bit outdated, can be found at the resources
section of the first book.
Popular but less technical books that
you may find useful and are very
John Battelle, The Search:
How Google and Its Rivals Rewrote the Rules of Business and Transformed
Our Culture, Portfolio, 2005.
Ian Witten, Marco Gori, Teresa Numerico, Web Dragons: Inside the Myths of Search
Engine Technology, Morgan Kauffman, 2006.
We will be using the popular open
source enterprise search platform, Solr/Lucene
, which is based
on the even more popular Lucene
All email to the instructor and TA about this class should contain
"IST441" in the subject line. For example, the subject line might
read "IST441: Question about ....". Email without this
information might be deleted by spam filters or placed in a folder to
be read at a later date. Email with the appropriate identifier
will usually be read within 24 hours of being received.
Reuse: Materials from this
course can be publicly reused in other courses.