Publication Date:
2016
abstract:
The paper presents a new framework for discrimination of Latin and Italian languages. The first phase maps the text in the given language into a uniformly coded text. It is based on the position of each letter of the script in the text line and its height, derived from its energy profile. The second phase extracts run-length texture measures from the coded text given as 1-D image, by producing a feature vector of 11 values. The obtained feature vectors are adopted for language discrimination by using a clustering algorithm. As a result, the distinction between the two languages is perfectly realized with an accuracy of 100% on a complex database of documents in Latin and Italian languages.
Iris type:
4.1 Contributo in Atti di convegno
Keywords:
Clustering; Document analysis; Image processing; Information retrieval; Italian language; Statistical analysis
List of contributors:
Brodic, D.; Amelio, A.; Milivojevic, Z. N.
Book title:
CEUR Workshop Proceedings
Published in: