Loading...
Please wait, while we are loading the content...
Similar Documents
Regex-based entity extraction with active learning and genetic programming
| Content Provider | ACM Digital Library |
|---|---|
| Author | Medvet, Eric Tarlao, Fabiano Bartoli, Alberto De Lorenzo, Andrea |
| Abstract | We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user annotates only one desired extraction and then merely answers extraction queries generated by the system. The resulting framework is attractive because it is the system, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a number of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required. We consider the long-standing problem of the automatic generation of regular expressions for text extraction, based solely on examples of the desired behavior. We investigate several active learning approaches in which the user anno- tates only one desired extraction and then merely answers extraction queries generated by the system. The resulting framework is attractive because it is the sys- tem, not the user, which digs out the data in search of the samples most suitable to the specific learning task. We tailor our proposals to a state-of-the-art learner based on Genetic Programming and we assess them experimentally on a num- ber of challenging tasks of realistic complexity. The results indicate that active learning is indeed a viable framework in this application domain and may thus significantly decrease the amount of costly annotation effort required. |
| Starting Page | 7 |
| Ending Page | 15 |
| Page Count | 9 |
| File Format | |
| ISSN | 15596915 19310161 |
| DOI | 10.1145/2993231.2993232 |
| Journal | ACM SIGAPP Applied Computing Review (SIAP) |
| Volume Number | 16 |
| Issue Number | 2 |
| Language | English |
| Publisher | Association for Computing Machinery (ACM) |
| Publisher Date | 2013-06-01 |
| Publisher Place | New York |
| Access Restriction | One Nation One Subscription (ONOS) |
| Subject Keyword | Entity extraction Machine learning Information extraction Programming by examples |
| Content Type | Text |
| Resource Type | Article |