Visualize Twitter Data with Power BI

Yannawut Kimnaruk
5 min readJun 17, 2022

Power BI is a powerful tool for visualizing data. Many visualization tools have already been provided, so you can just drag and drop data fields to create beautiful plots. However, when it comes to text data such as customer feedback or Twitter posts, it might not easy to visualize and find insight from it.

In this article, I will show an example of a text data dashboard from Twitter during the Covid19 pandemic.

💽 Dataset

Data used in this article comes from

The data was collected from Twitter during the Covid19 pandemic and classified sentiment as extremely positive, positive, neutral, negative, and extremely negative.

This dataset contains 41157 rows and 5 columns.

data sample

📊 Visualization

This is a snapshot of my dashboard.

It could be divided into 4 parts as shown below.

Let’s see how to create each part!

1. Number of Tweets by Sentiment

This plot is used to show an overview of the data like how many tweets are positive or negative.

It can also be used as a filter that users can click on the sentiment and other plots will change accordingly. Therefore, users can analyze the characteristics of each sentiment one after another.

It is an easy plot.

In Visualization plain, click on the Stacked bar chart icon and select the Sentiment column in the Fields plain (I groups sentiment into 3 groups, but it is ok not to do so).

Finish.

Note: Best practice is to create a new measurement to count sentiment and visualize that measurement.

2. Word Cloud

An elegant visualization to show the number of word occurrences. The larger the text means the higher frequency of that word in the dataset.

You could find the article about how to create a word cloud in the below link.

3. Number of Hashtags

This plot objective is to see the number of hashtag uses include in the Twitter texts. (Most of them are about covid and the synonyms lol)

This plot is created by Python visual. To make this article short, I will include a link to another article that describes the detail of the Python setting and Python visual in Power BI.

If you have already set Python in Power BI, you can go to ‘Use Python to visualize’ part.

In the Python script editor area, copy&paste the code below.

# The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:# dataset = pandas.DataFrame(OriginalTweet)# dataset = dataset.drop_duplicates()# Import required libraries and setting plotimport seaborn as snsimport matplotlib.pyplot as pltimport resns.set(font_scale = 7)
# Create funtion to find hashtags
def find_hash(text): line=re.findall(r'(?<=#)\w+',text) return " ".join(line)# Apply functiondataset['hash']=dataset['OriginalTweet'].apply(lambda x:find_hash(x))temp=dataset['hash'].value_counts()[:][1:11]temp= temp.to_frame().reset_index().rename(columns={'index':'Hashtag','hash':'count'})
# Plot Graph
plt.figure(figsize=(120, 25))sns.barplot(x="Hashtag",y="count", data = temp)plt.show()

Code explanation

  • Create a function ‘find_hash’ that will return a list of hashtags that match the regular expression pattern.
  • Create a new DataFrame column, ‘hash’ which is the result of applying the ‘find_hash’ function to the ‘OriginalTweet’ column.
  • Count hashtag occurrence in temp.
  • Visualize with a seaborn barplot.

4. Number of Words

The number of tweet words may correlate to the feeling of the tweet poster like he feels bad, so he types a long sentence to explain his feeling (only hypothesis anyway).

Before creating this graph, I added a new column called Count_Word.

  • Go to Transform data.
  • Under the Add Column tab, Click the Custom Column.
  • You will see a Custom Column window. Type a column_name and formula and click OK.
List.Count(Text.Split([OriginalTweet]," "))
  • In the Home tab, click Close&Apply.
  • In the Visualization plain, select the Stacked column chart and set Axis, Legend, and Values as illustrated in the below image.
  • You can try the stacked column chart, clustered column chart, or box plot as well

Limitation of Power BI

Although you can view the text data overview and interact with it the in Power BI dashboard, there are some limitations in data analysis such as N-gram visualization which I can’t find an easy way to do in Power BI but it can be done using only Python.

Conclusion

The dashboard for Twitter data can include overview sentiment, word cloud, hashtag, and the number of words. The advantage of Power BI is that the dashboard is interactive while the disadvantage is that it is not as flexible as creating charts in Python.

If you want to see any dashboard example, leave a comment below.

Please follow me for more Power BI articles!!!

--

--