27/03/2021

[Python] Sentiment analysis with NLTK

NLTK is an interesting and powerful Python library to perform text analysis. A common use case is to perform sentiment analysis of some text.


While the configuration can go very deep, here is a sample application to analyse the sentiment of online posts written in English. The first time you execute your program, you will be prompted to download data for NLTK, choose the one that best fits your use case.

 import nltk  
 from nltk.sentiment import SentimentIntensityAnalyzer  
   
 #common words that have no impact on the overall analysis  
 stopwords = nltk.corpus.stopwords.words("english")  
 sia = SentimentIntensityAnalyzer()  
   
 #strip the STRING text only to relevant words  
 text = ""  
 for word in STRING.split():  
  #stopwords are all provided to us in lowercase  
  if word.lower() not in stopwords:  
   text += word + " "  
   
 #get overall sentiment in this post  
 sentiment = sia.polarity_scores(text)["compound"]  


An honorable mention goes to the pyenchant library, which uses Enchant spellchecker dictionaries and allows to check whether a string is a word in a given language. Additionally, it will match ignoring weird casing:

 import enchant  
   
 english_words = enchant.Dict("en_US")  
 if english_words.check(STRING):  
  #this is an english word  

No comments:

Post a Comment

With great power comes great responsibility