Automated analysis of the ever-increasing amount of reviews available through the Web can enable businesses to identify why people like or dislike (aspects of) products or brands, yet to this end, a reliable indication of the intended sentiment of reviews is of crucial importance. This sentiment is typically quantified in universal star ratings, which are not always available. We propose and compare the performance of several statistical methods of automatically classifying star ratings of reviews represented by means of a binary vector representation, with features signaling the presence of sentiment-carrying words. A nearest neighbor classifier maximizes recall, whereas a naïve Bayes classifier excels in terms of precision, accuracy, and the root mean squared error of the assigned number of stars.

, , ,
doi.org/10.1007/978-3-642-30864-2_24, hdl.handle.net/1765/57640
Erasmus School of Economics

Hogenboom, A., Boon, F., & Frasincar, F. (2012). A statistical approach to star rating classification of sentiment. doi:10.1007/978-3-642-30864-2_24