Application of a Probability-Based Algorithm to Extraction of Product Features from Online Reviews
Technical Report CMU-ISRI-06-111, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 2006
KeyWords:information extraction,mining,personalization,product review
Prior research has demonstrated the viability of automatically extracting product features from online reviews. This
paper presents a probability-based algorithm and compares it to an existing support-based approach. Specifically, I
used each algorithm to extract features from 7 Amazon.com product categories and then asked end users to rate the
features in terms of helpfulness for choosing products. The end users preferred the features identified by the probability-
based algorithm. This probability-based algorithm can identify features that comprise a single noun or two successive
nouns (which end users rated as more helpful than features comprising only one noun), yet even for collections of
tens of thousands of reviews, it still executes fast enough (at around 1ms per review) for practical use.
Preferred citation: C. Scaffidi. Application of a Probability-Based Algorithm to Extraction of Product Features from Online Reviews. Technical Report CMU-ISRI-06-111, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, 2006
Entry last Updated 2006-06-20.
The software used to index and search these papers is Marian - the on-line-braian, available at Marian's Home site.