AtheneForschung - Informationsportal der UniBw M

Home / Alle InhalteElektronische Prüfungsarbeiten

Zurück
Zurück zum Anfang der Trefferliste
Dauerhafter Link zum angezeigten Objekt

Wenn Sie Schwierigkeiten haben, das Dokument zu öffnen, versuchen Sie auch bitte diesen Link

Autoren:

Bohne, Thomas

Dokumenttyp:

Dissertation / Thesis

Titel:

Heuristic Strategies for Single Document Analysis

Betreuer:

Borghoff, Uwe M., Prof. Dr.

Gutachter:

Borghoff, Uwe M., Prof. Dr.; Minas, Mark, Prof. Dr.-Ing.

Tag der Abgabe der Arbeit:

11.05.2015

Tag der mündlichen Prüfung:

21.09.2015

Publikationsdatum:

14.12.2015

Jahr:

2015

Seiten (Monografie):

153

Sprache:

Englisch

Schlagwörter:

Automatische Worterkennung ; Schlagwort ; Information Retrieval ; Textstruktur ; Analyse ; Heuristik

Stichwörter:

Keyword Extraction, Single Document Analysis, Change-Point Detection

Abstract:

The immense growth of digital text data evokes demand for automatic text analysis tools for information retrieval. A plain text provides sufficient information for a heuristic approach to identify meaningful keywords. Text as documents and text streams also feature an inherent structure that inform about their content. In this thesis, two approaches for retrieval of meaningful information from single documents are developed: keyword extraction and the detection of structural changes in texts. The combination of multiple heuristic keyword extraction algorithms is superior to individual methods, and can improve the quality of the results significantly. To further this idea in the first part of my thesis, I compare different combination methods and utilize PCA as a parameter-free and effective method to determine optimal combination candidates. Then, I demonstrate the success of these methods with an efficient and flexible keyword extraction approach that is language-independent, fast, and does not require a training phase. The results of this algorithm are deemed meaningful, and its performance is superior to the well known TF-IDF. In the second part of my thesis, I analyze the structure of text documents and develop a novel algorithm that detects structural changes. This algorithm identifies fluctuations in the composition of a text. It is flexible, language-independent, and performs on single documents as well as indefinite text streams. I demonstrate the accuracy of my approach using cogent real-world examples, and present its compelling performance with a benchmark algorithm. As an application of my work, I implement a keyword extraction approach into the CommunityMashup in a collaboration. The CommunityMashup is a data aggregation solution for different social networks. With the extraction of keywords in almost real time, we are able to identify new relations between contents and people and visualize them with an interactive and platform-independent solution. «

DDC-Notation:

005.741

URN:

urn:nbn:de:bvb:706-4421

Fakultät:

Fakultät für Informatik

Institut:

INF 2 - Institut für Softwaretechnologie

Professur:

Borghoff, Uwe M.

Open Access ja oder nein?:

Ja / Yes