Projects
Current and Old Project Descriptions
Current Research Projects
The focus of my research was cyberinfrastructure, information retrieval, knowledge extraction and management, and data mining, both for public and private small and big data for information resources with a particular interest in scholarly big data. Our application domain has primarily been the Web and Internet with a focus on academic, scientific and government information, data, and documents. There is also an interest in automated methods for developing and designing cyberinfrastructure (also known as e-science) for academic research and related areas. This led to research in various aspects of social networks and how they facilitate information access. Other interests are knowledge aggregation and architectures.
My recent research and scholarly interests are listed below:
- Design and creation of specialty or vertical search engines, cyberinfrastructure, digital libraries, and focused crawlers.
- Open Source infrastructure and tool kits for search
engines and digital libaries: SeerSuite.
- SeerSuite code is now available on Github;
now you can build your own CiteSeerx like Seer or just
use the special extraction modules.
- YouSeer is
a complete and powerful open source search engine
available on SourceForge
that integrates the open source crawler Heritrix with the
open source indexer Solr/Lucene.
- Next Generation CiteSeer, CiteSeerx, built from
SeerSuite
- Automated interactive textbook building tool, BBookX.
- Specialty search engines such as:
- PseudoSeer searches arXiv papers for pseudocode.
- PrivaSeer is a search engine for web privacy policies.
- COVIDSeer permitted searches of Covid related papers.
- CSSeers was an expert recommendation search engine.
- RefSeer was a citation recommendation system.
- A collaboration search tool, CollabSeer, covered
over 400,000 collaborators in CiteSeerX.
- TableSeer was a table search engine integrated into ChemXSeer and CiteseerX.
- EthnicSeer was a name ethnicity classifier based on name ethnicity as defined in wikipedia.
- AckSeer was an early acknowledgement indexing search engine.
- A cyberinfrastructure search engine and data portal built on SeerSuite for environmental chemistry: ChemxSeer
- This project focused on searching for chemical formulae,
table search, figure search and data search for chemistry.
- Automated methods in systems research and cyberinfrastructure - information and knowledge extraction, data mining, web services. Please see some of my recent papers.
- As an example please see the work on automated acknowledgement indexing and who and what gets acknowledged published in PNAS yields insights into scientific and social trends.
- Social network analysis for enhanced search and understanding trends in science and discovering e-communities.
Past research and scholarly areas which are still of interest:
- Recent work on deep learning with recurrent neural
networks and sequence processing. Please see new work.
- Computational models of e-commerce, most recently game markets (letter in Science.)
- How do we measure and characterize the web, what's there and what is changing?
Brief Descriptions of Some Past Projects:
- ChemxSeer
was a search engine focused on the development of a
cyberinfrastructure portal for environmental kinetic
chemistry integrating chemistry specific search with data
repositories and analysis tools.
- Next Generation CiteSeer, CiteSeerx, has focused on the future of the CiteSeer search engine and digital library.
- CiteSeer.IST was the Penn State home of the academic search engine and digital library CiteSeer and has been replaced by CiteSeerx
- A protosearch engine for archaeology, ArchSeer, primarily focused on map search.
- eBizSearch
was a CiteSeer-like niche search engine and digital library
for business schools. eBizSearch was a predecessor to
SmealSearch and was a also CiteSeer-like niche search engine
for finding and indexing documents about e-business and
e-commerce. All these were rolled into BizSeer.
- Acknowledgement
search was part of the old CiteSeer.IST project and
has now been replace by AckSeer.
- Mobile social networking using mobile phones, MobiSNA, was one of the first to use social network ranking of videos.
- BotSeer was a specialty search engine devoted to harvesting and providing search functionality for web site robots.txt files and related information and software.
- Inquirus was once a popular content-based metasearch engine.
- Inquirus2 was a preference-based metasearch engine.