Python Word Cloud in power bi

EnanBahadur · July 5, 2024, 5:05pm

Hi Experts,

I am doing the NLP course of python in power bi which is great source to enhance the skill. since in my work i have lot of data related to survey.
I just wanted to ask how to avoid the blank space in around the word cloud.
i tried to adjust the height and width and still blank spaces comes.
Can someone please assist.

attached pic for reference.

and below code of Python is running in power bi:

The following code to create a dataframe and remove duplicated rows is always executed and acts as a preamble for your script:

dataset = pandas.DataFrame(country, description, designation, points, price, province, region_1, region_2, taster_name, taster_twitter_handle, title, variety, winery)

dataset = dataset.drop_duplicates()

Paste or type your script code here:

#Load the essential Libraries
import pandas as pd
import numpy as np
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
from nltk import ngrams
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
from wordcloud import WordCloud
from PIL import Image

Create a data cleaning Function to remove stop words, Punctuation and Lemmatize

def big_clean_text(sentence):
sentence = " ".join(sentence)
new_tokens = word_tokenize(sentence)
new_tokens = [t.lower() for t in new_tokens]
new_tokens = [t for t in new_tokens if t not in stopwords.words(‘english’)]
#new_tokens = [t for t in new_tokens if t not in bad_words]
new_tokens = [t for t in new_tokens if t.isalpha()]
lemmatizer = WordNetLemmatizer()
new_tokens = [lemmatizer.lemmatize(t) for t in new_tokens]
counted = Counter(new_tokens)
counted_2 = Counter(ngrams(new_tokens,2))
counted_3 = Counter(ngrams(new_tokens,3))
word_freq = pd.DataFrame(counted.items(),columns = [‘word’,‘frequency’])
bi_gram_freq = pd.DataFrame(counted_2.items(),columns = [‘bi-gram’,‘frequency’])
tri_gram_freq = pd.DataFrame(counted_3.items(),columns = [‘Tri-gram’,‘frequency’])
return new_tokens

Run the function

tokens = big_clean_text(dataset[‘description’])

joining all the words

bag_of_words = " ".join(tokens)

Creating Wordcloud

plt.figure(figsize=(16,9))
plt.axis(‘off’)
word_cloud = WordCloud(width=800, height=400, max_words=500, stopwords=[‘wine’,‘syrah’,‘flavor’]).generate(bag_of_words)
plt.imshow(word_cloud)
plt.show()

Keith · July 5, 2024, 5:27pm

Hi @EnanBahadur

Please give DataMentor/EDNA AI tools a try to help you with you solve your issues.

Also, just to let you know when you are doing the course if you are stuck on something you can ask the question within the course but selecting the Ask and Learn icon.

thanks
Keith