Development of indonesian automated document reader: evaluation of text segmentation algorithms
In developing countries such as Indonesia, textual information is carried mostly byrnpaper medium. Such information, however, is not available to citizens with visualrnimpairment. To assist them, Agency for the Assessment and Application ofrnTechnology (Badan Pengkajian dan Penerapan Teknologi; BPPT) developsrnIndonesian Automated Document Reader (I-ADR), which converts textualrninformation on paper documents to speech. This research is conducted to develop arnprototype of I-ADR featuring OCR, Text Summarization, and Text-to-Speech (TTS)rnSynthesizer modules. The main focus is Text Segmentation module as an integral partrnof OCR. In this study, several Text Segmentation algorithms for grayscale and colorrnimages are developed and evaluated. Text segmentation for grayscale images uses anrnimproved version of Enhanced CRLA (Sun, 2006), while segmentation for colorrnimages employs Multivalued Image Decomposition algorithm (Jain and Yu, 1998)rncombined with the improved Enhanced CRLA. Based on the experiments, the successrnrate for grayscale images is 100% and 96.35% for color images.
B00951 | (wh) | Available |
No other version available