Projects
Current Research Projects
The current focus of my research is information retrieval, knowledge extraction and management, and data mining, both for large and small data and information resources. Our application domain has primarily been the Web and Internet with a focus on academic information, trends and documents. I am also interested in automated methods for developing and designing cyberinfrastructure (also known as e-science) for academic research and other areas. This has led us to research in social networks. I am also interested in e-commerce and knowledge aggregation plus novel web tools, portals and architectures.
My recent research and scholarly interests are listed below:
- Design and creation of specialty or vertical search engines, cyberinfrastructure, digital libraries, and focused crawlers.
- Next Generation CiteSeer, CiteSeerx, now in alpha!
- Improved and enhanced search engines: CiteSeer.IST, BizSeer.
- The first search engine for robots.txt files, BotSeer.
- A protosearch engine for archaeology, ArchSeer.
- New cyberinfrastructure search engine and data portal for environmental chemistry: ChemxSeer
- This project focuses on searching for chemical formulae, table
search, figure search and data search for chemistry.
- Automated methods in systems research and cyberinfrastructure - information and knowledge extraction, data mining, web services. Please see some of my recent papers.
- As an example please see the work on automated acknowledgement indexing and who and what gets acknowledged published in PNAS yields insights into scientific and social trends.
- Social network analysis for enhanced search and understanding trends in science and discovering e-communities.
Past research and scholarly areas which are still of interest:
- Computational models of e-commerce, most recently game markets (letter in Science.)
- How do we measure and characterize the web, what's there and what is changing?
New and Old Projects:
- ChemxSeer is a new project
focused on the development of a cyberinfrastructure
portal for
environmental kinetic chemistry integrating chemistry specific search
with data repositories and analysis tools.
- BotSeer is specialty
search engine devoted to harvesting and providing search functionality
for web site robots.txt files and related information and software.
- Next Generation CiteSeer, CiteSeerx, is a new project that has focused on the future of the CiteSeer search engine and digital library.
- CiteSeer.IST is the new Penn State home of the academic search engine and digital library CiteSeer.
- SmealSearch is a CiteSeer-like niche search engine and digital library for business schools. eBizSearch was a predecessor to SmealSearch and was a CiteSeer-like niche search engine for finding and indexing documents about e-business and e-commerce.
- Inquirus was once a popular content-based metasearch engine.
- Inquirus2 was a preference-based metasearch engine.
I have an interest in machine learning, pattern recognition, neural networks and artificial intelligence, especially as applied to the topics above. I like to use existing methods and develop new methods for novel automated applications in handling and extracting knowledge and information from massive data sets, temporal data, multimedia, etc. My past work has focused on the role of memory in learning and theoretical models of knowledge representation and capture, learning and intelligent multiagent systems, and applications of intelligent systems to computing and computer systems, finance, and signal processing.