A new image analysis framework for Latin and Italian language discrimination

Conference Paper

Publication Date:

2016

abstract:

The paper presents a new framework for discrimination of Latin and Italian languages. The first phase maps the text in the given language into a uniformly coded text. It is based on the position of each letter of the script in the text line and its height, derived from its energy profile. The second phase extracts run-length texture measures from the coded text given as 1-D image, by producing a feature vector of 11 values. The obtained feature vectors are adopted for language discrimination by using a clustering algorithm. As a result, the distinction between the two languages is perfectly realized with an accuracy of 100% on a complex database of documents in Latin and Italian languages.

Iris type:

4.1 Contributo in Atti di convegno

Keywords:

Clustering; Document analysis; Image processing; Information retrieval; Italian language; Statistical analysis

List of contributors:

Brodic, D.; Amelio, A.; Milivojevic, Z. N.

Authors of the University:

AMELIO Alessia

Handle:

https://ricerca.unich.it/handle/11564/770248

Book title:

CEUR Workshop Proceedings

Published in:

CEUR WORKSHOP PROCEEDINGS

Journal

CEUR WORKSHOP PROCEEDINGS

Series