This example shows how to use the Valence Aware Dictionary and sEntiment Reasoner (VADER) algorithm for sentiment analysis.
The VADER algorithm uses a list of annotated words (the sentiment lexicon), where each word has a corresponding sentiment score. The VADER algorithm also utilizes word lists that modify the scores of proceeding words in the text:
Boosters – words or n-grams that boost the sentiment of proceeding tokens. For example, words like "absolutely" and "amazingly".
Dampeners – words or n-grams that dampen the sentiment of proceeding tokens. For example, words like "hardly" and "somewhat".
Negations – words that negate the sentiment of proceeding tokens. For example, words like "not" and "isn't".
To evaluate sentiment in text, use the vaderSentimentScores
function.
Extract the text data in the file weekendUpdates.xlsx
using readtable
. The file weekendUpdates.xlsx
contains status updates containing the hashtags "#weekend"
and "#vacation"
.
filename = "weekendUpdates.xlsx"; tbl = readtable(filename,'TextType','string'); head(tbl)
ans=8Γ2 table
ID TextData
__ _________________________________________________________________________________
1 "Happy anniversary! β€ Next stop: Paris! β #vacation"
2 "Haha, BBQ on the beach, engage smug mode! π π β€ π #vacation"
3 "getting ready for Saturday night π #yum #weekend π"
4 "Say it with me - I NEED A #VACATION!!! βΉ"
5 "π Chilling π at home for the first time in agesβ¦This is the life! π #weekend"
6 "My last #weekend before the exam π’ π."
7 "canβt believe my #vacation is over π’ so unfair"
8 "Canβt wait for tennis this #weekend πΎππ₯ π"
Create an array of tokenized documents from the text data and view the first few documents.
str = tbl.TextData; documents = tokenizedDocument(str); documents(1:5)
ans = 5x1 tokenizedDocument: 11 tokens: Happy anniversary ! β€ Next stop : Paris ! β #vacation 16 tokens: Haha , BBQ on the beach , engage smug mode ! π π β€ π #vacation 9 tokens: getting ready for Saturday night π #yum #weekend π 13 tokens: Say it with me - I NEED A #VACATION ! ! ! βΉ 19 tokens: π Chilling π at home for the first time in ages β¦ This is the life ! π #weekend
Evaluate the sentiment of the tokenized documents using the vaderSentimentLexicon
function. Scores close to 1 indicate positive sentiment, scores close to -1 indicate negative sentiment, and scores close to 0 indicate neutral sentiment.
compoundScores = vaderSentimentScores(documents);
View the scores of the first few documents.
compoundScores(1:5)
ans = 5Γ1
0.4738
0.9348
0.6705
-0.5067
0.7345
Visualize the text with positive and negative sentiment in word clouds.
idx = compoundScores > 0; strPositive = str(idx); strNegative = str(~idx); figure subplot(1,2,1) wordcloud(strPositive); title("Positive Sentiment") subplot(1,2,2) wordcloud(strNegative); title("Negative Sentiment")
ratioSentimentScores
| tokenizedDocument
| vaderSentimentScores