Current Project Links
Current Research Projects
The current focus of my research is cyberinfrastructure, information retrieval, knowledge extraction and management, and data mining, both for public and private small and big data for information resources with a particular interest in scholarly big data. Our application domain has primarily been the Web and Internet with a focus on academic, scientific and government information, data, and documents. I am also interested in automated methods for developing and designing cyberinfrastructure (also known as e-science) for academic research and related areas. This has led to research in various aspects of social networks and how they facilitate information access. Other interests are knowledge aggregation and architectures.
My recent research and scholarly interests are listed below:
- Design and creation of specialty or vertical search engines, cyberinfrastructure, digital libraries, and focused crawlers.
- Open Source infrastructure and tool kits for search engines and
digital libaries: SeerSuite.
- SeerSuite code is now available on SourceForge;
now you can
build your own CiteSeerx like
Seer or just use the special extraction modules.
- YouSeer is a
complete and powerful open source search engine available on SourceForge that integrates
the open source crawler Heritrix
with the open source indexer Solr/Lucene.
Generation CiteSeer, CiteSeerx, built from SeerSuite, now
with a new look and author name disambiguation.
- Automated interactive textbook building tool, BBookX.
- Specialty search engines such as:
- AckSeer is a new and maybe the only acknowledgement indexing search engine.
- Need recommendations for your paper? Try our new RefSeer recommendation system.
- A collaboration search tool, CollabSeer, that covers over 400,000 collaborators was just released.
- GrantSeer allowed program managers to search their grant portfolios.
- SeerSeer was based on the CiteSeerX database and allowed search of experts.
- EthnicSeer is a name ethinicity classifier based on name
ethnicity as defined in wikipedia.
- A cyberinfrastructure search engine and data portal built on SeerSuite for environmental chemistry: ChemxSeer
- This project focuses on searching for chemical formulae, table
search, figure search and data search for chemistry.
- Automated methods in systems research and cyberinfrastructure - information and knowledge extraction, data mining, web services. Please see some of my recent papers.
- As an example please see the work on automated acknowledgement indexing and who and what gets acknowledged published in PNAS yields insights into scientific and social trends.
- Social network analysis for enhanced search and understanding trends in science and discovering e-communities.
Past research and scholarly areas which are still of interest:
- Computational models of e-commerce, most recently game markets (letter in Science.)
- How do we measure and characterize the web, what's there and what is changing?
Brief Descriptions of Some New and Old Projects:
- ChemxSeer is a search engine
focused on the development of a cyberinfrastructure
environmental kinetic chemistry integrating chemistry specific search
with data repositories and analysis tools.
- Next Generation CiteSeer, CiteSeerx, has focused on the future of the CiteSeer search engine and digital library.
- CiteSeer.IST was the Penn State home of the academic search engine and digital library CiteSeer and has been replaced by CiteSeerx
- A protosearch engine for archaeology, ArchSeer, primarily focused on map search.
- eBizSearch was a
CiteSeer-like niche search engine and digital library
for business schools. eBizSearch
was a predecessor to SmealSearch and was a also CiteSeer-like niche
engine for finding and indexing documents
about e-business and e-commerce. All these were rolled into BizSeer.
search was part of the old CiteSeer.IST project and has now been
replace by AckSeer.
- Mobile social networking using mobile phones, MobiSNA, was one of the first to use social network ranking of videos.
- BotSeer was a specialty search engine devoted to harvesting and providing search functionality for web site robots.txt files and related information and software.
- Inquirus was once a popular content-based metasearch engine.
- Inquirus2 was a preference-based metasearch engine.
I have an interest in machine learning, pattern recognition, text mining, information extraction and retrieval, and artificial intelligence, especially as applied to the topics above. I like to use existing methods and develop new methods for novel automated applications in handling and extracting knowledge and information from massive data sets, temporal data, multimedia, etc. My past work has focused on the role of memory in learning and theoretical models of knowledge representation and capture, learning and intelligent multiagent systems, and applications of intelligent systems to computing and computer systems, finance, and signal processing.