IST 441 Course Project
This project counts 35% of your grade and is a team project for
undergraduates. Teams can be found here.
Under certain conditions, projects can be individual activities. For
any of these issues, please see the instructor right away.
The grading as part of your final grade is as follows.
* Search engine - 20%
* Report - 10%
* Presentations - 5% (both will be graded)
In this project, you will build two search engines for a customer.
See below a suggested customer list.
1). You will use the open source software, Elasticsearch available online
at the class server, to build a vertical or specialty search engines
for a customer. You can also download and use Elasticsearch on
your own server or laptop. However, you will need to give access to
queries for the instructor.
2). You will also build a Google
Custom Search engine on the same topic. Depending on the
student project, some students will be allowed to use other open source search engine
tools. You will deliver the finished search engine to that
customer. Your customer must acknowledge your successful
transfer of the project.
1) Vertical/specialty Search Engine:
With the Elasticsearch software you will construct a specialty
search engine and crawl the web for at least a 1000 document dataset
of documents of interest. Your specialty engine needs to be approved
by the instructor and the customer and should be of interest to the
customer. See the instructor for suggestions if necessary. (Other
projects are possible but must be approved by the instructor.)
You will then index these documents with Elasticsarch and provide a
user interface with Kibana or another tool to query the index and
generate a ranking based on an arbitrary query. An interface
must be provided to permit others to query your data and index.
Students if possible should not use already crawled collections but
can use search engine selections. Part of this project is to crawl
for "your" special collection.
Google Custom Search:
On the same topic you will build a search engine using Google Custom Search.
- You will provide a query box in your specialty search engine
for your Google Custom Search engine.
Please exercise good judgment in what you crawl and use an ethical
crawler that respects the robots exclusion principle and does not
over crawl a web site.
DO NOT CRAWL:
There are many crawlers available. We will install Scrapy
on the IST server. Other crawlers you can use are Heritrix and Nutch.
- SEARCH ENGINES WITHOUT REGISTERING
To complete this project, you need to do the following:
1) Submit 2
comprehensive hard copy final reports 20 pages or so that discusses
the specialty search engine, how it works and what was crawled.
2) Submit via email a PDF of the report.
For the report the motivation for your vertical engine should be
described. The indexing, the query process, and how the search
engine calculates relevance should be discussed in detail. In
the document must be a link to your
- speciality search engine,
- google custom search engine.
* You must compare your specialty engine with
that built using the Google Custom Search engine in terms of
relevance for at least 20 queries.
* Provide evidence of the crawled documents by
giving the urls crawled, the built index and query engine by
submitting the index on a memory stick or CD or providing access to
a web directory of the documents to the instructor. This is not
necessary for those who build their search engine on the IST 441
* Provide the web link to the query interface
that the instructor and TA can test and use.
* Deliver the search engine to the customer and
have the customer contact the instructor and TA.
Presentations (all in PowerPoint):
You will give two presentations on your search project; both
are graded. The first one is a brief 15 minutes or less; the final
one 30 or more minutes. The first is mid-semester and the second is
at the end of the semester.
1st presentation: Introductory presentation on your specialty search
engine topic. (15 minutes)
Final presentation: Overview of the search engine. (30 or more
- Your choice of a search topic should be motivated.
- Why is this a good topic for which to build a specialty search
- How hard will it be to get the documents?
- What is the competition?
- Who is your customer and why is your customer interested?
Both of your presentations must be professionally prepared in
PowerPoint and well organized. Hard copies must precede each
- Discuss what was discovered about the topic chosen.
- What was crawled and why.
- If appropriate, give a live
demonstration of the search engine.
- The customer should be identified; address and email must be
Two professional quality
hard copies and a PDF of the report are due by 8 am April 29.
The built search engine interface url (with index if the search
engine is not built on the IST441 server)
The report and PowerPoint of the presentation must be provided to
the instructor and also to the search engine customer.
The reports must be in hard copy and PDF to get full credit.
A web link to your search engine query interface must be up for the
entire last week of the semester.
The customer to be involved in your project will help determine what
topic the search engine will cover.
**The customer must notify the
instructor and TA about the final acceptance of the search engine
project and report.