This work aims to demonstrate the ability of Python modules to scrape web data (specifically Sony Xperia handsets from Amazon product reviews), run some text/language analysis to understand and illustrate key features, and finally to test different modelling techniques to predict sentiment.
You'll find the necessary notebooks on Yoon's github page.
The notebook for scraping the reviews can be found here
The notebook for the text analysis and sentiment model can be found here
In this second notebook, the following python modules are used:
Pandas - data manipulation and data analysis
Matplotlib / season / wordclound - data visualisation
re (regex) - identify patterns
Sklearn - feature extraction / statistical modelling / model validation / model measurement
Nltk - language processing
The topics covered in this second notebook can be summarised as follows:
Data quality checks
Derive new features
Preliminary data analysis with wordcloud and charts
Text features extraction
Sentiment modelling
List important features from the best model
Further development
If you have any feedback or questions, Yoon will be happy to help!
Yoonkang Low is a freelance Analytical and Data Science consultant with over 9 years experience, including time with analytical agencies, and at Amazon.