Keyword extraction for text categorization

An, Jiyuan; Chen, Yi-Ping Phoebe

chen-keywordextractionfortext-2005.pdf (1.04 MB)

Keyword extraction for text categorization

conference contribution

posted on 2005-01-01, 00:00 authored by Jiyuan An, Yi-Ping Phoebe Chen

Text categorization (TC) is one of the main applications of machine learning. Many methods have been proposed, such as Rocchio method, Naive bayes based method, and SVM based text classification method. These methods learn labeled text documents and then construct a classifier. A new coming text document's category can be predicted. However, these methods do not give the description of each category. In the machine learning field, there are many concept learning algorithms, such as, ID3 and CN2. This paper proposes a more robust algorithm to induce concepts from training examples, which is based on enumeration of all possible keywords combinations. Experimental results show that the rules produced by our approach have more precision and simplicity than that of other methods.

History

Event

Active Media Technology. Conference (2005: Kagawa, Japan)

Pagination

556 - 561

Publisher

IEEE

Location

Kagawa, Japan

Place of publication

Piscataway, N.J.

Start date

2005-05-19

End date

2005-05-21

ISBN-13

9780780390355

ISBN-10

0780390350

Language

eng

Notes

This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder

Publication classification

E1 Full written paper - refereed

Copyright notice

2005 IEEE.

Editor/Contributor(s)

H Tarumi, Y Li, T Yoshida

Title of proceedings

Proceedings of the 2005 International Conference on Active Media Technology

Usage metrics

Keywords

Untagged

Licence

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Keyword extraction for text categorization

History

Event

Pagination

Publisher

Location

Place of publication

Start date

End date

ISBN-13

ISBN-10

Language

Notes

Publication classification

Copyright notice

Editor/Contributor(s)

Title of proceedings

Usage metrics

Categories

Keywords

Licence

Exports