Deneb Example - Statistics

Deneb/Vega-Lite can transform a dataset using a one-dimensional kernel density estimation (KDE) into a new dataset of samples of the estimated densities, and statistics can then be extracted from this transformed sample dataset. The example herein presents a probability density of 2023 model year new vehicle fuel economy, underlayed by vertical lines at various probabilities, and overlayed by a set of the (internally-calculated) statistical values.

This example illustrates a number of Deneb/Vega-Lite features, including:

0 - General:

  • use of a “lineBreak” character in the “config” section for visual-wide application
  • use of a “layer” block consisting of 5 sections: quantiles, Q1-Q3 background rectangle, density plot, statistics, and footnote

1 - Quantiles (i.e., probabilities):

  • use of a “transform” block with the “qauntile” operation to add an internal Vega-Lite calculations of the 5%, 25%, 50%, 75%, and 95% probabilities for the dataset
  • use of a “layer” block with:
    • a “rule” mark with X value only to display the vertical lines for the 5%, 25%, 50%, 75%, and 95% probabilities
    • a “text” mark with X value only to display the quantile labels (P5=5%; Q1=25%; Q2=50%; Q3=75%; P95=95%)

2 - Q1-Q3 Background Rectangle (i.e., 25%-75%):

  • use of a “rect” mark with the “q1” and “q3” aggregations to have Vega-Lite perform the internal calculations for the position of the background rectangle

3 - Density Plot:

  • use of a “transform” block to add an internal Vega-Lite calculation of the probability density of the dataset
  • use of a “line” mark to display the probability density with a fixed X-axis scale to help visualize comparisons of filtered datasets (0-60 mpg) and a Y-axis using Power BI formatting of percentages

4 - Statistics:

  • use of a “transform” block with
    • “joinaggregate” operations to add internal Vega-Lite calculations of standard deviation, variance, mean, median, 5% probability, 25% probability (i.e., Q1), 75% probability (i.e., Q3), and 95% probability
    • composition of individual statistic labels, all with 3 significant digits
    • composition of a single statistics label (collated, multi-line) using the “lineBreak” character
  • use of a “text” mark to display the collated statistics label at hard-coded X and Y positions

5 - Footnote:

  • use of a “text” mark with hard-coded text (incl. “lineBreak” character) and X and Y positions
JSON Specification Code
{
  "title": {
    "anchor": "start",
    "align": "left",
    "text": "Power BI Statistics using Deneb",
    "font": "Verdana",
    "fontSize": 36,
    "fontWeight": "bold",
    "fontStyle": "normal",
    "subtitle": "Vehicle Fuel Economy Performance",
    "subtitleFont": "Verdana",
    "subtitleFontSize": 20,
    "subtitleFontWeight": "normal",
    "subtitleFontStyle": "italic",
    "offset": 10
  },
  "data": {"name": "dataset"},
  "layer": [
    {
      "name": "QUANTILES",
      "transform": [
        {
          "quantile": "Fuel Economy",
          "probs": [
            0.05,
            0.25,
            0.5,
            0.75,
            0.95
          ]
        }
      ],
      "layer": [
        {
          "name": "QUANTILE_LINES",
          "mark": {
            "type": "rule",
            "strokeWidth": 3,
            "tooltip": null
          },
          "encoding": {
            "x": {
              "field": "value",
              "type": "quantitative"
            },
            "color": {
              "condition": [
                {
                  "test": "datum.prob == 0.05 || datum.prob == 0.95",
                  "value": "#BB2649"
                },
                {
                  "test": "datum.prob == 0.50",
                  "value": "#969696"
                },
                {
                  "test": "datum.prob == 0.25 || datum.prob == 0.75",
                  "value": "#969696"
                }
              ],
              "value": "transparent"
            }
          }
        },
        {
          "name": "QUANTILE_LABELS",
          "mark": {
            "type": "text",
            "fontSize": 18,
            "y": 0.1,
            "yOffset": -2,
            "baseline": "bottom"
          },
          "encoding": {
            "x": {
              "field": "value",
              "type": "quantitative"
            },
            "text": {
              "condition": [
                {
                  "test": "datum.prob == 0.05",
                  "value": "P5"
                },
                {
                  "test": "datum.prob == 0.25",
                  "value": "Q1"
                },
                {
                  "test": "datum.prob == 0.5",
                  "value": "Q2"
                },
                {
                  "test": "datum.prob == 0.75",
                  "value": "Q3"
                },
                {
                  "test": "datum.prob == 0.95",
                  "value": "P95"
                }
              ],
              "field": "prob",
              "title": "Probability",
              "format": ".0%"
            },
            "color": {
              "condition": [
                {
                  "test": "datum.prob == 0.05 || datum.prob == 0.95",
                  "value": "#BB2649"
                },
                {
                  "test": "datum.prob == 0.25 || datum.prob == 0.75",
                  "value": "#7D868A"
                },
                {
                  "test": "datum.prob == 0.5",
                  "value": "#7D868A"
                }
              ],
              "value": "transparent"
            }
          }
        }
      ]
    },
    {
      "name": "25_75_RECTANGLE",
      "mark": {
        "type": "rect",
        "tooltip": null,
        "color": "#E3E3E3",
        "opacity": 0.5
      },
      "encoding": {
        "x": {
          "aggregate": "q1",
          "field": "Fuel Economy"
        },
        "x2": {
          "aggregate": "q3",
          "field": "Fuel Economy"
        }
      }
    },
    {
      "name": "DENSITY_LINE_PLOT",
      "transform": [
        {
          "density": "Fuel Economy",
          "bandwidth": 0.4
        }
      ],
      "layer": [
        {
          "name": "LINE",
          "mark": {
            "type": "line",
            "tooltip": true,
            "strokeWidth": 4,
            "color": "#35558A"
          },
          "encoding": {
            "x": {
              "field": "value",
              "type": "quantitative",
              "title": "Fuel Economy (mpg)",
              "scale": {
                "domain": [10, 60]
              },
              "axis": {
                "titleFontSize": 16,
                "titleFontWeight": "bold",
                "labelFontSize": 16,
                "labelFontWeight": "normal",
                "tickCount": 10,
                "ticks": true,
                "grid": true,
                "domain": true
              }
            },
            "y": {
              "field": "density",
              "type": "quantitative",
              "title": "Distribution of Readings",
              "axis": {
                "titleFontSize": 16,
                "titleFontWeight": "bold",
                "labelFontSize": 16,
                "labelFontWeight": "normal",
                "tickCount": 5,
                "domain": true,
                "formatType": "pbiFormat",
                "format": "0.%"
              }
            }
          }
        }
      ]
    },
    {
      "name": "STATISTICS",
      "transform": [
        {
          "joinaggregate": [
            {
              "op": "stdev",
              "field": "Fuel Economy",
              "as": "_stdev"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "variance",
              "field": "Fuel Economy",
              "as": "_variance"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "mean",
              "field": "Fuel Economy",
              "as": "_mean"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "median",
              "field": "Fuel Economy",
              "as": "_median"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "ci0",
              "field": "Fuel Economy",
              "as": "_p5"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "ci1",
              "field": "Fuel Economy",
              "as": "_p95"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "q1",
              "field": "Fuel Economy",
              "as": "_q1"
            }
          ]
        },
        {
          "joinaggregate": [
            {
              "op": "q3",
              "field": "Fuel Economy",
              "as": "_q3"
            }
          ]
        },
        {
          "calculate": "'Standard Deviation: ' + format(datum._stdev, '.3s')",
          "as": "_stdev_label"
        },
        {
          "calculate": "'Variance: ' + format(datum._variance, '.3s')",
          "as": "_variance_label"
        },
        {
          "calculate": "'Mean: ' + format(datum._mean, '.3s')",
          "as": "_mean_label"
        },
        {
          "calculate": "'Median: ' + format(datum._median, '.3s')",
          "as": "_median_label"
        },
        {
          "calculate": "'Quartile 1 (Q1): ' + format(datum._q1, '.3s')",
          "as": "_q1_label"
        },
        {
          "calculate": "'Quartile 3 (Q3): ' + format(datum._q3, '.3s')",
          "as": "_q3_label"
        },
        {
          "calculate": "datum._stdev_label + '|' + datum._variance_label + '|' + datum._q1_label + '|' + datum._mean_label + '|' + datum._median_label + '|' + datum._q3_label",
          "as": "_statistics_label"
        }
      ],
      "mark": {
        "type": "text",
        "lineBreak": "|",
        "font": "Segoe UI",
        "fontStyle": "italic",
        "fontWeight": "normal",
        "fontSize": 12,
        "align": "right",
        "x": 1180,
        "y": 10
      },
      "encoding": {
        "text": {
          "field": "_statistics_label",
          "type": "nominal"
        }
      }
    },
    {
      "name": "FOOTNOTE",
      "mark": {
        "type": "text",
        "align": "left",
        "fontStyle": "italic"
      },
      "encoding": {
        "text": {
          "value": "* Fuel Economy Guide - Model Year 2023 New Vehicles|* U.S. EPA & DOE (fueleconomy.gov)"
        },
        "x": {"value": -60},
        "y": {"value": 590}
      }
    }
  ]
}

The intent of this example is not to provide a finished visual, but rather to serve as a starting point for further custom visual development.

Also included is the sample PBIX using public data from the Fuel Economy Guide for Model Year 2023 New Vehicles, U.S. Environmental Protection Agency and U.S. Department of Energy (fueleconomy.gov).

This example is provided as-is for information purposes only, and its use is solely at the discretion of the end user; no responsibility is assumed by the author.

Greg
Deneb Examples - Statistics.pbix (121.4 KB)

1 Like

marking as solved