Projects
Current Research Projects
The current focus of my research is cyberinfrastructure, information retrieval, knowledge extraction and management, and data mining, both for large and small data and information resources. Our application domain has primarily been the Web and Internet with a focus on academic, scientific and government information, data, and documents. I am also interested in automated methods for developing and designing cyberinfrastructure (also known as e-science) for academic research and other areas. This has led to research in various aspects of social networks and how they facilitate information access. Other interests are knowledge aggregation, novel web tools, portals and architectures.
My recent research and scholarly interests are listed below:
- Design and creation of specialty or vertical search engines, cyberinfrastructure, digital libraries, and focused crawlers.
- Open Source infrastructure and tool kit for digital libaries
and search engines: SeerSuite.
- SeerSuite code is now available on SourceForge;
now you can
build your own Seer.
- Next
Generation CiteSeer, CiteSeerx, now in beta.
- Improved and enhanced search engines; CiteSeer.IST that replaced
CiteSeer and BizSeer for
academic business documents.
- The first search engine for robots.txt files, BotSeer.
- A protosearch engine for archaeology, ArchSeer.
- New cyberinfrastructure search engine and data portal for environmental chemistry: ChemxSeer
- This project focuses on searching for chemical formulae, table
search, figure search and data search for chemistry.
- Automated methods in systems research and cyberinfrastructure - information and knowledge extraction, data mining, web services. Please see some of my recent papers.
- As an example please see the work on automated acknowledgement indexing and who and what gets acknowledged published in PNAS yields insights into scientific and social trends.
- Social network analysis for enhanced search and understanding trends in science and discovering e-communities.
- Mobile
social networking using cell phones, MobiSNA.
Past research and scholarly areas which are still of interest:
- Computational models of e-commerce, most recently game markets (letter in Science.)
- How do we measure and characterize the web, what's there and what is changing?
Brief Descriptions of New and Old Projects:
- ChemxSeer is a search engine
focused on the development of a cyberinfrastructure
portal for
environmental kinetic chemistry integrating chemistry specific search
with data repositories and analysis tools.
- BotSeer is specialty
search engine devoted to harvesting and providing search functionality
for web site robots.txt files and related information and software.
- Next Generation CiteSeer, CiteSeerx, is a rather new project that has focused on the future of the CiteSeer search engine and digital library.
- CiteSeer.IST was the Penn State home of the academic search engine and digital library CiteSeer. It will eventually be transformed into CiteSeerx
- eBizSearch was a
CiteSeer-like niche search engine and digital library
for business schools. eBizSearch
was a predecessor to SmealSearch and was a also CiteSeer-like niche
search
engine for finding and indexing documents
about e-business and e-commerce. All these are now part of BizSeer.
- Inquirus was once a popular content-based metasearch engine.
- Inquirus2 was a preference-based metasearch engine.
I have an interest in machine learning, pattern recognition, neural networks and artificial intelligence, especially as applied to the topics above. I like to use existing methods and develop new methods for novel automated applications in handling and extracting knowledge and information from massive data sets, temporal data, multimedia, etc. My past work has focused on the role of memory in learning and theoretical models of knowledge representation and capture, learning and intelligent multiagent systems, and applications of intelligent systems to computing and computer systems, finance, and signal processing.