Logo
User: Guest  Login
Authors:
Lee, Yeong Su; Geierhos, Michaela 
Document type:
Konferenzbeitrag / Conference Paper 
Title:
Business Specific Online Information Extraction from German Websites 
Collection editors:
Aly, Robin; Hauff, Claudia; Hiemstra, Djoerd; Huibers, Theo W. C.; de Jong, Franciska M. G. 
Title of conference publication:
Proceedings of the 9th Dutch-Belgian Information Retrieval Workshop 
Conference title:
Dutch-Belgian Information Retrieval Workshop (9., 2009, Enschede) 
Venue:
Enschede, The Netherlands 
Year of conference:
2009 
Place of publication:
Twente 
Publisher:
Centre for Telematics and Information Technology (CTIT), University of Twente 
Year:
2009 
Pages from - to:
79-86 
Language:
Englisch 
Keywords:
company search ; information extraction ; sublanguage 
Abstract:
This paper presents a system that uses the domain name of a German business website to locate its information pages (e.g. company profile, contact page, imprint) and then identifies business specific information. We therefore concentrate on the extraction of characteristic vocabulary like company names, addresses, contact details, CEOs, etc. Above all, we interpret the HTML structure of documents and analyze some contextual facts to transform the unstructured web pages into structured forms. Ou...    »
 
ISSN:
0929-0672 
Open Access yes or no?:
Nein / No