Loading...
Please wait, while we are loading the content...
Similar Documents
Single and Combined Features for the Detection of Anglicisms in German and Afrikaans
| Content Provider | Semantic Scholar |
|---|---|
| Author | Leidig, Sebastian |
| Copyright Year | 2014 |
| Abstract | We develop, analyze and combine features for the automatic detection of Anglicisms included in German and Afrikaans text which can improve automatic speech recognition, speech synthesis and other fields such as natural language processing. To evaluate our methods we collected and annotated two German word lists from different domains (IT, general news). We also applied our detection methods to an Afrikaans word list from the NCHLT corpus. Our features are based on grapheme perplexity, grapheme-to-phoneme (G2P) confidence, Google hits count as well as spell-checker dictionary and Wiktionary lookup. With our G2P confidence and Wiktionary features we introduce new approaches to detect Anglicisms. Comparing features based on English models and models of the matrix language allows us to refrain from determining thresholds in a supervised way. Furthermore we do not rely on training data that needs to be expensively annotated – instead we use available resources like word lists and pronunciation dictionaries. Our best single feature is based on the G2P confidence with an f-score of up to 70.39%. Combining our features using a voting, decision tree or support vector machine (SVM) gives us further improvements, especially where the single features performed poorly. We achieve up to 44% relative improvement in f-score on our Afrikaans data. Our best result with a combination is an f-score of 75.44%. |
| File Format | PDF HTM / HTML |
| Alternate Webpage(s) | http://csl.anthropomatik.kit.edu/downloads/BA_SebastianLeidig.pdf |
| Language | English |
| Access Restriction | Open |
| Content Type | Text |
| Resource Type | Article |