Deneb Example - Violin Plot

Deneb/Vega-Lite can be used to generate a Violin Plot, which can be used to compare the distribution of data between categories. The probability density estimation is often smoothed using a kernel density estimation, or KDE, algorithm/process. The data distribution can alternately be presented graphically in a Box Plot, where the “box” displays the 75th percentile, median, and 25th percentile, and the “whiskers” display the maximum and minimum.

The example presented herein overlays a box plot on a kernel density estimation plot to increase the utility of the visual and uses the box plot mark and the area mark along with the density transform built-in to Vega-Lite. Two different datasets are used to display both vertical and horizontal violin plots.

Thanks to @DM-P for developing the excellent Violin Plot custom visual for Power BI (that has been available for free in AppSource for a few years now) which served as the basis for this investigation, and especially for his thoughts and suggestions for rendering such a visual using Deneb/Vega-Lite. As well, his knowledge of the Vega-Lite syntax for rendering vertical area charts was instrumental in the preparation of this example.

The violin plot was actually the starting point for my investigation, but wanted to explore additional datasets before publishing. I did, however, see that this technique could easily be leveraged to create a Ridge Plot, so I released the ridge plot first.

The synthetic datasets used were:

The 3 visuals share mostly the same code (which is actually quite short in all cases, with less than 200 lines of code after single-line formatting), and the code of the vertical sales example is described below.

This example illustrates a number of Deneb/Vega-Lite features, including:
0 - General:

  • a “title” block with a 3-line subtitle array
  • a “facet” block to implement small multiples by category (Country)
  • a “spec” block with
    • width and height to be used for each category
    • a shared “encoding” block to ensure all marks use the same X and Y axes and colours
      • the Y axis configured with:
        • a custom scale domain and axis tick count to ensure even grid intervals
      • the X axis configured with:
        • no grid lines or tick marks
      • the category colours are configured with:
        • a dynamic scale using the eight colours in the current Power BI theme as made available by Deneb
    • a “layer” block for the KDE plot and the box plot

1 - KDE Plot:

  • a shared “transform” block with:
    • a “density” transform to calculate the KDE distribution for the sales and construct a new dataset
    • a “calculate” transform to invert (x -1) the positive density calculated in the previous step
  • a “layer” block with:
    • an “area” mark with vertical orientation and 50% opacity using the positive density
    • an “area” mark with vertical orientation and 50% opacity using the negative density

2 - Box Plot:

  • a “boxplot” mark with:
    • “min-max” extent
    • black border and lines (stroke, median, whiskers)
    • dark grey fill colour
Deneb/Vega-Lite JSON Code:
{
  "title": {
    "anchor": "start",
    "align": "left",
    "offset": 20,
    "text": "Power BI Violin Plot using Deneb - Sales (Vertical)",
    "font": "Verdana",
    "fontSize": 24,
    "fontWeight": "bold",
    "fontStyle": "normal",
    "subtitle": [
      "Components: Box Plot and Kernel Density Estimation Plot",
      "Data Source: Enterprise DNA Practice Dataset (dates adjusted forward by 7 years)",
      "Distribution of Sales by Country"
    ],
    "subtitleFont": "Verdana",
    "subtitleFontSize": 16,
    "subtitleFontWeight": "normal",
    "subtitleFontStyle": "italic"
  },
  "data": {
    "name": "dataset"
  },
  "spacing": 0,
  "facet": {
    "column": {
      "field": "Country",
      "header": {
        "orient": "bottom",
        "title": null,
        "labelFontSize": 14
      }
    }
  },
  "spec": {
    "width": 180,
    "height": 500,
    "encoding": {
      "y": {
        "type": "quantitative",
        "title": "Total Sales",
        "axis": {
          "tickCount": 10,
          "titleFontSize": 20,
          "labelFontSize": 15
        }
      },
      "x": {
        "type": "quantitative",
        "axis": {
          "labels": false,
          "title": null,
          "grid": false,
          "ticks": false
        }
      },
      "color": {
        "field": "Country",
        "type": "nominal",
        "legend": null,
        "scale": {
          "range": [
            {
              "expr": "pbiColor(0)"
            },
            {
              "expr": "pbiColor(1)"
            },
            {
              "expr": "pbiColor(2)"
            },
            {
              "expr": "pbiColor(3)"
            },
            {
              "expr": "pbiColor(4)"
            },
            {
              "expr": "pbiColor(5)"
            },
            {
              "expr": "pbiColor(6)"
            },
            {
              "expr": "pbiColor(7)"
            }
          ]
        }
      }
    },
    "layer": [
      {
        "name": "KDE_PLOT",
        "transform": [
          {
            "density": "Total Sales",
            "groupby": [
              "Country"
            ],
            "extent": [
              0,
              100000
            ],
            "as": [
              "_kde_value",
              "_kde_density"
            ]
          },
          {
            "calculate": "datum['_kde_density'] * -1",
            "as": "_negative_kde_density"
          }
        ],
        "layer": [
          {
            "name": "KDE_POSITIVE",
            "mark": {
              "type": "area",
              "orient": "vertical",
              "opacity": 0.5
            },
            "encoding": {
              "y": {
                "field": "_kde_value"
              },
              "x": {
                "field": "_kde_density"
              }
            }
          },
          {
            "name": "KDE_NEGATIVE",
            "mark": {
              "type": "area",
              "orient": "vertical",
              "opacity": 0.5
            },
            "encoding": {
              "y": {
                "field": "_kde_value"
              },
              "x": {
                "field": "_negative_kde_density"
              }
            }
          }
        ],
        "encoding": {
          "x2": {
            "datum": 0
          }
        }
      },
      {
        "name": "BOX_PLOT",
        "mark": {
          "type": "boxplot",
          "extent": "min-max",
          "median": {
            "color": "black",
            "strokeWidth": 2
          },
          "size": 20
        },
        "encoding": {
          "y": {
            "field": "Total Sales"
          },
          "fill": {
            "value": "#969696"
          },
          "stroke": {
            "value": "black"
          }
        }
      }
    ]
  }
}

Also included is the development sample PBIX using the 2 synthetic datasets described above.

The intent of these examples were not to provide finished visuals, but rather to explore the use of the Deneb custom visual and the Vega-Lite language within Power BI and to serve as starting points for further development.

These examples are provided as-is for information purposes only, and their use is solely at the discretion of the end user; no responsibility is assumed by the author.

Greg
Deneb Example - Violin Plot - V5.pbix (3.3 MB)

2 Likes

marking as solved