Developing an efficient algorithm for representation and compression of large Bengali text

Marjan, MA; UDDIN, MD PALASH; Afjal, MI; Haque, MD

File(s) under permanent embargo

Developing an efficient algorithm for representation and compression of large Bengali text

conference contribution

posted on 2023-04-28, 06:31 authored by MA Marjan, MD PALASH UDDINMD PALASH UDDIN, MI Afjal, MD Haque

Efficient coding is one of the challenging aspects of information and communication theory. On the other hand, the natural languages such as Bengali is coded using Unicode technology which requires more space and thus takes more time to transfer the data of that language. In this paper, we have proposed a novel algorithm to represent Bengali text efficiently and then to compress the text offering a better compression ratio. Each Bengali character is represented by a unique 2-digit intermediate decimal value. Indexing and sorting all the word values successive subtraction is performed on the values in hope to reduce the weight of the numbers. The new values of each word can now be encoded with a very few bits. In comparison to other compressors, the compression ratio of the proposed algorithm decreases in a big amount for the large text which may contain more duplicate or redundant words, more words with the same length and more words of the same length with the same prefix called Uposorgo in Bengali.

History

Pagination

22-25

Location

Cox's Bazar, Bangladesh

Publisher DOI

https://doi.org/10.1109/IFOST.2014.6991063

Start date

2014-10-21

End date

2014-10-23

ISBN-13

9781479960620

Language

eng

Publication classification

E1.1 Full written paper - refereed

Title of proceedings

IFOST 2014 : Proceedings of the 9th International Forum on Strategic Technology

Event

Strategic Technology. International Forum (2014 : 9th : Cox's Bazar, Bangladesh)

Publisher

IEEE

Place of publication

Piscataway, N.J.

Usage metrics

Keywords

Uncategorised value

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

File(s) under permanent embargo

Developing an efficient algorithm for representation and compression of large Bengali text

History

Pagination

Location

Publisher DOI

Start date

End date

ISBN-13

Language

Publication classification

Title of proceedings

Event

Publisher

Place of publication

Usage metrics

Categories

Keywords

Licence

Exports