NLTK is an interesting and powerful Python library to perform text analysis. A common use case is to perform sentiment analysis of some text.
While the configuration can go very deep, here is a sample application to analyse the sentiment of online posts written in English. The first time you execute your program, you will be prompted to download data for NLTK, choose the one that best fits your use case.
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
#common words that have no impact on the overall analysis
stopwords = nltk.corpus.stopwords.words("english")
sia = SentimentIntensityAnalyzer()
#strip the STRING text only to relevant words
text = ""
for word in STRING.split():
#stopwords are all provided to us in lowercase
if word.lower() not in stopwords:
text += word + " "
#get overall sentiment in this post
sentiment = sia.polarity_scores(text)["compound"]
An honorable mention goes to the pyenchant library, which uses Enchant spellchecker dictionaries and allows to check whether a string is a word in a given language. Additionally, it will match ignoring weird casing:
import enchant
english_words = enchant.Dict("en_US")
if english_words.check(STRING):
#this is an english word
No comments:
Post a Comment
With great power comes great responsibility