Using Pycaret for Random Forest Classifcation (Employee churn) in Power BI

anonymous221 · July 18, 2023, 4:20pm

HR_comma_sep.csv (582.8 KB)
dashbiard.pbix (3.5 MB)

I’m trying to integrate my machine learning model (Random Forest) I did in Jupyter Notebook inside power BI in order to predict if an employee will leave or not based on features. Using gridsearch , and after testing multiple algorithms, it turned out that the Random Forest was the most performant one for my dataset (CSV below) .

I created a clone table of my dataset , I called it “Training” and I run inside that dataset this script , given the best hyperparamters of my model:

import classification module and setup environment

from pycaret.classification import *
clf1 = setup(dataset, target = ‘left’)

train and save random forest classifier model

rf= create_model(‘rf’, max_depth=8, max_features=0.5, n_estimators=300)
final_rf = finalize_model(rf)
save_model(final_rf, ‘C:/Users/hp/Desktop/Sprint PBI/final_rf_powerbi’)

Then in the original clone, I did this python code to import the saved pickle file.

from pycaret.classification import *
rf = load_model(‘C:/Users/hp/Desktop/Sprint PBI/final_rf_powerbi’)
dataset = predict_model(rf, data = dataset)

Eventually, this will add to my dataset two columns “Predicted label” and “predicted score”

The last thing , I wanted to visualize a prediction text based on the predicted value using the python visual. Eventually, based on my inputs (features) (which will also be my filters/slicers) , when their values changes, the prediction should change too.

Here’s a sneak peek , you’ll find the PBIX file below along with dataset

Here’s the code of the python custom visual:

dataset = pandas.DataFrame(average_montly_hours, bonus, Department, last_evaluation, left, number_project, promotion_last_5years, salary, satisfaction_level, time_spend_company, Work_accident)
dataset = dataset.drop_duplicates()

import pandas as pd
from pycaret.regression import *
import matplotlib.pyplot as plt

rf = load_model(‘C:/Users/hp/Desktop/Sprint PBI/final_rf_powerbi’)

Use the trained model to make predictions

predictions = predict_model(rf, data=dataset)

Get the predicted label and score columns from the predictions

predicted_label = predictions[‘prediction_label’]
predicted_score = predictions[‘prediction_score’]

Check if any element in predicted_label is equal to 1

if any(predicted_label == 1):
# score = predicted_score.iloc[0]
# output_text = “The employee is likely to leave with a probability of {:.2f}”.format(score)
output_text = "The employee is likely to leave with a probability "

else:
output_text = “The employee is not likely to leave”

def display_text(s, color=“blue”, fontsize=20):
plt.figure(facecolor=“none”)
plt.text(0, 0, s, color=color, fontsize=fontsize)
plt.axis(‘off’) # Optional, remove axes from the plot
plt.show()

Call the function to display the text

display_text(output_text)

I was wondering how can I correct my work , like how to make this prediction visualized correctly.