Implementing N-Grams + Python + Power BI

Vishy · February 19, 2023, 7:58pm

Hi team ,

Currently i want to implement N Grams - Visual using Power Bi.
Attached is the PBix and the code i got from internet which is failing in my case. Kindly request you help me with the same.
test.pbix (30.9 KB)

I used the code from https://towardsdatascience.com/create-an-n-gram-ranking-in-power-bi-b27ba076366

Keith · February 19, 2023, 8:16pm

please provide more information. What kind of error are you getting? Do you have python installed on your computer?
not much information

Vishy · February 19, 2023, 8:21pm

@Keith - Yes i have python installed on my machine, error is as below -

Error Message:
Python script error.
LookupError:

Resource e[93mstopwordse[0m not found.
Please use the NLTK Downloader to obtain the resource:

e[31m>>> import nltk

nltk.download(‘stopwords’)
e[0m
For more information see: https://www.nltk.org/data.html

Attempted to load e[93mcorpora/stopwords.zip/stopwords/e[0m

Searched in:
- ‘C:\Users\F5294746/nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\share\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\lib\nltk_data’
- ‘C:\Users\F5294746\AppData\Roaming\nltk_data’
- ‘C:\nltk_data’
- ‘D:\nltk_data’
- ‘E:\nltk_data’

During handling of the above exception, another exception occurred:

LookupError:

Resource e[93mstopwordse[0m not found.
Please use the NLTK Downloader to obtain the resource:

e[31m>>> import nltk

nltk.download(‘stopwords’)
e[0m
For more information see: https://www.nltk.org/data.html

Attempted to load e[93mcorpora/stopwordse[0m

Searched in:
- ‘C:\Users\F5294746/nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\share\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\lib\nltk_data’
- ‘C:\Users\F5294746\AppData\Roaming\nltk_data’
- ‘C:\nltk_data’
- ‘D:\nltk_data’
- ‘E:\nltk_data’

Stack Trace:
Microsoft.PowerBI.ExploreServiceCommon.ScriptHandlerException: Python script error.
LookupError:

Resource e[93mstopwordse[0m not found.
Please use the NLTK Downloader to obtain the resource:

e[31m>>> import nltk

nltk.download(‘stopwords’)
e[0m
For more information see: https://www.nltk.org/data.html

Attempted to load e[93mcorpora/stopwords.zip/stopwords/e[0m

Searched in:
- ‘C:\Users\F5294746/nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\share\nltk_data’
- ‘C:\Users\F5294746\AppData\Local\Programs\Python\Python311\lib\nltk_data’
- ‘C:\Users\F5294746\AppData\Roaming\nltk_data’
- ‘C:\nltk_data’
- ‘D:\nltk_data’
- ‘E:\nltk_data’

Vishy · February 19, 2023, 8:22pm

Here is the image of pbix

Keith · February 19, 2023, 8:45pm

I’m not sure of this but check this within Power BI:

Vishy · February 20, 2023, 7:58am

Hi @Keith -
The python is installed correctly and if u create a normal data frame and plot , it works correctly

Vishy · February 20, 2023, 7:58am

@Keith - Do you have any python code which can be used to built the N grams for the pbix shared

AntrikshSharma · February 20, 2023, 8:23am

@Vishy First I downloaded the stop words and wordnet using a code editor.

Next I split the code into steps so it is easier to debug

import re
import unicodedata
import nltk
from nltk.corpus import stopwords
import pandas as pd
import matplotlib.pyplot as plt

ADDITIONAL_STOPWORDS = ['covfefe']

def basic_clean(text):
  wnl = nltk.stem.WordNetLemmatizer()
  stopwords = nltk.corpus.stopwords.words('english') + ADDITIONAL_STOPWORDS
  text = (unicodedata.normalize('NFKD', text)
    .encode('ascii', 'ignore')
    .decode('utf-8', 'ignore')
    .lower())
  words = re.sub(r'[^\w\s]', '', text).split()
  return [wnl.lemmatize(word) for word in words if word not in stopwords]

words = basic_clean(''.join(str(dataset['text'].tolist())))
bigrams_series = (pd.Series(nltk.ngrams(words, 2)).value_counts())[:12]
bigrams_series.sort_values(inplace=True)
ax = plt.barh(bigrams_series, color='blue', width=.9, figsize=(12, 8))
plt.show()

test (1).pbix (31.6 KB)

Keith · February 20, 2023, 11:52am

no i don’t
sorry

Vishy · February 20, 2023, 12:52pm

@AntrikshSharma - I will look into the same and get back thanks so much

EnterpriseDNA · February 21, 2023, 1:14pm

Hello @Vishy

Did the responses above help solve your query?

If not, can you let us know where you’re stuck and what additional assistance you need?

If it did, please mark the answer as the SOLUTION.

Thank you

EnterpriseDNA · February 22, 2023, 8:30am

Hello @Vishy ,

Just following up if the response above helps you solve your inquiry.
If it did, please mark his answer as the SOLUTION.

We’ve noticed that no response was received from you on the post above. If there won’t be any activity in the next few days, we’ll tag this post as Solved.

Vishy · February 22, 2023, 12:29pm

I will be working on it today and would give a feedback was busy with some priority reporting

jeanrose87 · February 24, 2023, 6:20am

Hi @Vishy

I hope you are doing well. I was wondering if it would be possible for us to close this thread temporarily while you are trying out the solutions provided above. Once you have made progress and require additional support, feel free to open this thread or open a new one and reach out to us again.

Thank you for your understanding and I look forward to assisting you further in the future.

jeanrose87 · February 27, 2023, 9:50am

Hi @Vishy,

We’ve noticed that no response was received from you on the post above.

Just following up if the response above helps you solve your inquiry.
If it did, please mark his answer as the SOLUTION.

In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved.

jeanrose87 · February 28, 2023, 12:57pm

Hi @Vishy

Due to inactivity, a response on this post has been tagged as “Solution”.

If you have a follow-up question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the check box.

We request you to kindly take time to answer the Enterprise DNA Forum User Experience Survey,.

We hope you’ll give your insights on how we can further improve the Support forum. Thanks!

Vishy · March 1, 2023, 9:35am

I have got the solution above can be marked as resolved thanks @AntrikshSharma