T test in Power BI

annamarie · December 26, 2020, 4:24am

Hi,

I need some help with statistical analysis in Power BI. I need to do a T.Test in Power BI, done over 2 periods of time. The data has 2 categories applied, and I’m not sure how to go about it.

The other challenge is that this needs to be done on the rolling average of these two periods.

I have done some research on how to manually do a t test in excel and compared the results to that of the Data Analysis add in in excel. And I got them to match, I tried to adapt it in Power BI, but no luck.

The other hurdle is, how to get the P.value from the T Stat, maybe by adding a t table in the model?

I believe this is a 2 tailed paired/ dependant t test.

Manual calc that I applied

T Crit calc – t distribution table

I have attached an excel example of what the results should be. There are 3 tabs:

Manual calc
The Data Analysis result from built in Excel features
The required result I would like in Power BI

The periods are, the last 6 months of the data and the other for the 6 months before that.

This Power BI community answer gives a good idea of what needs to be done, but my data isn’t optimal for this solution.

Any help would be appreciated.
Ttest excel result.xlsx (49.9 KB) Sample Doc.pbix (134.9 KB)

BrianJ · December 26, 2020, 8:12am

@annamarie,

Welcome to the forum – great to have you here!

With regard to your question, I would strongly advise against doing anything except for the most basic statistical analysis via DAX. It’s certainly possible to do what you want to do with DAX, but it’s akin to building a house with a Swiss Army knife - you will have to build the test statistics manually, as well as the critical value and p value lookups, as well as the comparison measure between the test statistic and critical value. Here’s what that looks like based on code from the Microsoft Community:

Paired t-test in DAX

Summary

Significance = 
//We first extend our sample data, but filter out missing Score2 data. We add a difference column and a difference^2 column
VAR _tmpTableForT = ADDCOLUMNS(FILTER(PairedT, ISERROR(PairedT[Score2]) = FALSE), 
    "Diff", PairedT[Score1] - PairedT[Score2], 
    "DiffSQ", (PairedT[Score1] - PairedT[Score2])^2)

//We create a few variables that make calculating the T-value easier.
VAR sumDiff = SUMX(_tmpTableForT, [Diff])
VAR sumDiffSQ = SUMX(_tmpTableForT, [DiffSQ])
VAR sumDiffColSQ = sumDiff^2
VAR N = COUNTROWS(_tmpTableForT)

//We calculate the T_Score
VAR T_nominator = sumDiff/N
VAR T_denominator = SQRT((sumDiffSQ - (sumDiffColSQ / N)) / ((N-1)*N))
VAR T_Score = ABS(T_nominator/T_denominator)

//We lookup the T-value of the three significance levels. First we assess the row to compare against based on N-1
VAR deg = SWITCH(TRUE(),
    N-1 <= 30, N-1,
    N-1 <= 45, 30,
    N-1 <= 90, 60,
    N-1 <= 560, 120,
    1000)
VAR T_0_05 = LOOKUPVALUE(T_Table[Sig_0.05], T_Table[DegreesOfFreedom], deg)
VAR T_0_01 = LOOKUPVALUE(T_Table[Sig_0.01], T_Table[DegreesOfFreedom], deg)
VAR T_0_001 = LOOKUPVALUE(T_Table[Sig_0.001], T_Table[DegreesOfFreedom], deg)

//If the T_score  is bigger than the T_sig, then it is at least that significance level. So, if T_score < T_0_05, it is not significant. If T_Score > T_0_001, then significance is  ***
VAR result = SWITCH(TRUE(),
    T_Score > T_0_001, "*** p=0.001",
    T_Score > T_0_01, "** p=0.01",
    T_Score > T_0_05, "* p=0.05",
    "Not significant")
RETURN
result
```
[/details]

The great news is that Power Bi integrates beautifully with R, and you can run R scripts and visualizations easily within PBI.

By comparison, what it takes DAX nearly 40 lines to do, R does better in 2 lines of code:

Compute t-test

res ← t.test(before, after, paired = TRUE)
res

It takes a liittle bit of work up front to install R, R Studio (a fantastic free IDE for R used by almost every R user) and Tidyverse, a compilation of some of the best/most popular R add-on packages for data import, cleaning/manipulation, output and visualization. I’ve put together some links below for you on how to get set up, and also how to script out your t-test analyses.

While it’s possible to do very powerful data cleaning/prep in R, typically what I do is prep my data in Power Query/M, and write my R scripts to do statistical analysis in R Studio, and then run the scripts from within Power Query. There’s a good simple example of this in the 3-Minute Tips video I linked to below.

Installing R and R studio

Installing Tidyverse Package in R

Writing T-Test Scripts in R

Conducting Paired T-Tests in R

Running R Scripts in Power Bi

I hope this is helpful. Feel free to give a shout if you have any additional questions.

Brian

EnterpriseDNA · December 28, 2020, 2:10am

Hi @annamarie, did the response provided by @BrianJ help you solve your query? If not, how far did you get and what kind of help you need further? If yes, kindly mark the thread as solved. Thanks!

EnterpriseDNA · December 29, 2020, 12:55am

Hi @annamarie, we’ve noticed that no response has been received from you since the 26th of December. We just want to check if you still need further help with this post? In case there won’t be any activity on it in the next few days, we’ll be tagging this post as Solved. If you have a follow question or concern related to this topic, please remove the Solution tag first by clicking the three dots beside Reply and then untick the checkbox. Thanks!

annamarie · January 2, 2021, 4:44am

Apologies ive been out of range. It looks like a good option. I will mark it asap. Early next week latest. If it worked. Thanx.

annamarie · January 2, 2021, 4:46am

Thank you Brian for this extensive answer to my question, i really appreciate it, I will be working on this today, and revert back soonest.
Annamarie

BrianJ · January 2, 2021, 5:30am

@annamarie,

Glad to help. Just give a shout if you run into any problems in the implementation.

Brian