Loading...
Please wait, while we are loading the content...
Similar Documents
2009 10th International Conference on Document Analysis and Recognition A New Framework for Recognition of Heavily Degraded Characters in Historical Typewritten Documents Based on Semi-Supervised Clustering †
| Content Provider | CiteSeerX |
|---|---|
| Author | Hu, J. Pletschacher, S. Antonacopoulos, A. |
| Abstract | This paper presents a new semi-supervised clustering framework to the recognition of heavily degraded characters in historical typewritten documents, where off-theshelf OCR typically fails. The constraints are generated using typographical (collection-independent) domain knowledge and are used to guide both sample (glyph set) partitioning and metric learning. Experimental results using simple features provide encouraging evidence that this approach can lead to significantly improved clustering results compared to simple K-Means clustering, as well as to clustering using a state-of-the art OCR engine. 1 |
| File Format | |
| Access Restriction | Open |
| Subject Keyword | Heavily Degraded Character Metric Learning Semi-supervised Clustering Historical Typewritten Document Recognition New Framework Off-theshelf Ocr New Semi-supervised Clustering Framework State-of-the Art Ocr Engine Document Analysis Experimental Result Domain Knowledge Clustering Result Simple Feature 10th International Conference K-means Clustering |
| Content Type | Text |
| Resource Type | Article |