A Proof-of-Concept of D³ Record Mining using Domain-Dependent Data
Herausgeber Sammlung:
Chang, Chin-Chen; Gelogo, Yvette E.; Caytiles, Ronnie E.
Titel Konferenzpublikation:
Software Technology
Untertitel Konferenzpublikation:
Prooceedings, International Conference, SoftTech 2012, Cebu, Philippines, May 2012
Jahrgang:
5
Konferenztitel:
International Conference Conference on Software Technology (1., 2012, Cebu)
Tagungsort:
Cebu, Philippines
Jahr der Konferenz:
2012
Datum Beginn der Konferenz:
29.05.2012
Datum Ende der Konferenz:
21.05.2012
Verlag:
SERSC
Jahr:
2012
Seiten von - bis:
134-139
Sprache:
Englisch
Abstract:
Our purpose is to perform data record extraction from onlineevent calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D³) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction methods; it is a so-called one-step mining technique. Moreover, another important criteria is that our system is robust to DOM and layout modifications made by web designers. Thus, preliminary experimental results are provided to demonstrate proof-of-concept of such an approach tested on websites in the German opera domain. Furthermore, we could show that our proposed technique outperforms other data record mining applications run on event sites. «
Our purpose is to perform data record extraction from onlineevent calendars exploiting sublanguage and domain characteristics. We therefore use so-called domain-dependent data (D³) completely based on language-specific key expressions and HTML patterns to recognize every single event given on the investigated web page. One of the most remarkable advantages of our method is that it does not require any additional classification steps based on machine learning algorithms or keyword extraction meth... »