Sona College of Technology, India
Four Centuries ago, knowledge transfer from experts to learners was mostly through oral recitation. Script based knowledge transfer was catered by scribing on rocks, metals, cloth and leaves. Due to the abundant availability, the southern part of India mainly used ‘Palm Leaves’ as the base material for scribing. These ancient methods of writing and protecting manuscripts vanished over a period of time. Today, due to climatic changes and aging factors these manuscripts have started to degrade. To preserve the information present on the palm-leaves, digital images of each leaves have been taken up and stored. We have developed a ‘Semi-Automatic System’, with primary emphasis on Tamil Manuscripts. In order to have only the textual information for storage and segmentation purposes, it is mandatory that these images are binarized. Binarization of degraded document images has been a challenge to many computer scientists. Binarized characters are segmented and fed to a Character Recognition module capable of replacing the handwritten version to a printed version. The obtained printed version cannot be a fool-proof one, hence a linguistic scholar capable of reading both the handwritten and printed version needs to verify and confirm the final printed version. However, this will drastically reduce the time as compared to the current way of doing things. The development of our module will be of great use to linguistic scholars for quicker scrambling of texts from palm leaves and hence write variety of annotations to a single text.
Palm leaf, binarization, character segmentation, CCCMA (Combined Connected Component and Minimal Area)