{"218":0,"2429":0,"2430":0,"2432":0,"2433":0,"2434":0,"2435":0}
Site Home
Site Home
Drexel University Libraries
Drexel University
Contact Us
å
iDEA: DREXEL LIBRARIES E-REPOSITORY AND ARCHIVES
iDEA: DREXEL LIBRARIES E-REPOSITORY AND ARCHIVES
Main sections
Main menu
Home
Search
Collections
Names
Subjects
Titles
About
You are here
Home
/
Islandora Repository
/
Theses, Dissertations, and Projects
/
Robust knowledge extraction over large text collections
Robust knowledge extraction over large text collections
Details
Title
Robust knowledge extraction over large text collections
Author(s)
Song, Min
Advisor(s)
Song, Il-Yeol
Keywords
Information Science
;
Information retrieval
;
QUERY (Information retrieval system)
Date
2005-05
Publisher
Drexel University
Thesis
Ph.D., Information Systems -- Drexel University, 2005
Abstract
Automatic knowledge extraction over large text collections has been a challenging task due to many constraints such as needs of large annotated training data, requirement of extensive manual processing of data, and huge amount of domain-specific terms. In order to address these constraints, this study proposes and develops a complete solution for extracting knowledge from large text collections with minimum human intervention. As a testbed system, a novel robust and quality knowledge extraction system, called RIKE (Robust Iterative Knowledge Extraction), has been developed. RIKE consists of two major components: DocSpotter and HiMMIE. DocSpotter queries and retrieves promising documents for extraction. HiMMIE extracts target entities based on a Mixture Hidden Markov Model from the selected documents from DocSpotter. The following three research questions are examined to evaluate RIKE: 1) How accurately does RIKE retrieve the promising documents for information extraction from huge text collections such as MEDLINE or TREC? 2) Does ontology enhance extraction accuracy of RIKE in retrieving the promising documents? 3) How well does RIKE extract the target entities from a huge medical text collection, MEDLINE? The major contributions of this study are1) an automatic unsupervised query generation for effective retrieval from text databases is proposed and evaluated, 2) Mixture Hidden Markov models for automatic instances extraction are proposed and tested, 3) Three Ontology-driven query expansion algorithms are proposed and evaluated, and 4) Object-oriented methodologies for knowledge extraction system are adopted. Through extensive experiments, RIKE is proved to be a robust and quality knowledge extraction technique. DocSpotter outperforms other leading techniques for retrieving promising documents for extraction from 15.5% to 35.34% in P@20. HiMMIE improves extraction accuracy from 9.43% to 24.67% in F-measures.
URI
http://hdl.handle.net/1860/495
In Collections
Theses, Dissertations, and Projects
/islandora/object/idea%3A495/datastream/OBJ/view
Search iDEA
All formats
Search by:
Keyword
Name
Subject
Title
Advanced Search
My Account
Login