A Proof-of-Concept of D³ Record Mining using Domain-Dependent Data
Collection editors:
Chang, Chin-Chen; Gelogo, Yvette E.; Caytiles, Ronnie E.
Title of conference publication:
Software Technology
Subtitle of conference publication:
Prooceedings, International Conference, SoftTech 2012, Cebu, Philippines, May 2012
Volume:
5
Conference title:
International Conference Conference on Software Technology (1., 2012, Cebu)
Venue:
Cebu, Philippines
Year of conference:
2012
Date of conference beginning:
29.05.2012
Date of conference ending:
21.05.2012
Publisher:
SERSC
Year:
2012
Pages from - to:
134-139
Language:
Englisch
Abstract:
Our purpose is to perform data record extraction from onlineevent calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D³) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction methods; it is a so-called one-step mining technique. Moreover, another important criteria is that our system is robust to DOM and layout modifications made by web designers. Thus, preliminary experimental results are provided to demonstrate proof-of-concept of such an approach tested on websites in the German opera domain. Furthermore, we could show that our proposed technique outperforms other data record mining applications run on event sites. «
Our purpose is to perform data record extraction from onlineevent calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D³) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction meth... »