A Three Level Decision Tree for Selecting the Perfect Visualization for Your Data

0

Perfect visualization for your data: Nowadays the modern workplace is pushing forward to the usage of more and more data in their decision making processes. With the mainstream of big data and business analytics organizations no longer require the IT areas guidance to analyze data; tools exist to allow any employee connect to shared data sources and make their own analysis.

But being able to analyze data also requires the use of visualizations to discover insights out of the data and be able to socialize them with colleagues and managers. The importance of Self Service Business Intelligence is not only exploratory. The ability to share and socialize information is a must in the data analysis process.

Data Scientists exists from decades, even though is a “vogue” term nowadays in every modern organization, the reality is that huge amounts of data always existed in fields like medicine, drug discovery, microbiological sciences, advanced finance, and biotech. The field is mature enough to produce a set of best practices that have to do on how data is shown to non-scientific audiences, in a way that the insights obtained are understood and can help to the persuasion of an idea.

In this article, we are showing a simple cheat sheet called the Chart Suggestions—A Thought-Starter from The Extreme Presentation™ Method created by Dr. Andrew Abela.

Selecting the perfect visualization

The Abela’s Cheatsheet provides a general overview which starts with a simple question: What would you like to show? The answer provides a decision tree to follow and determine the kind of the visualization you can use.

Abela's chart chooser

For the diagram, we are using Abela’s Chart Chooser PowerPoint Templates from SlideModel as it allows users to download the template and edit data-driven charts directly in PowerPoint.

The four main decision tree branches are based on the purpose of the perfect visualization:

  • Relationship
  • Comparison
  • Distribution
  • Composition

On each node, the questions focus on the nature of the information (static or dynamic), the number of variables involved and the series and categories. The following section deep dives in each branch of the tree and presents some examples.

Relationship

When the purpose of the chart is to show the relationship between variables of data or to infer some relationship insight, the cheat sheet proposes two charts:

  1. For analysis of the relationship between two variables, the suggested visualization is a simple scatter plot. Relationships can be detected-through clustering techniques (grouping points that seam together, or by extrapolation, inference if the apparent behavior of points that generate a geometrical shape as a line. An example usage would be the analysis of the relationship between height and weight of a person, also known as BMI Chart. Each cluster can be defined as a category of the person’s current health status.

Relationship scatter chart

  1. For three variables a useful and perfect visualization is the bubble chart. In nature is similar to the scatter plot, but it also adds the diameter of each point as a sampling variable. An example of this chat would be the analysis of the relationship between product price, product sales and the percentage of the total of the whole sales (that would be the diameter).

Comparison

The comparison chart selection decision tree branch starts with the option of comparison of items vs comparison over time.

Comparing among items

Comparison bar charts

  1. Parallel Bar Charts: The bar charts are ideal to compare elements with multiple items and different categories. An example would be to compare two car brands (category 1 and 2) and analyze the price of each segment (each bar represents a segment of luxury, family, etc.)
  2. Column charts: Ideal for few items and categories, for example, referencing the previous example, the user can chart a specific segment, for both brands of cars. In this cases, the series is represented by the colors of the chart.
  3. Table with Embed Charts: For comparisons of many elements and many categories the decision tree suggest the use of tables with embedded charts. Each chart can describe the behavior of different variables in the context of the two dimensions that define the table. Reasoning with the previous example of price comparison of car segments, within different brands, you can use a table where columns determine the region (we are adding a new dimension, geography), each row determines the segment, and each cell contains a bar chart with the price comparison of the two brands.
  4. Different Widths Columns charts: For Many categories and two variables, the variable width column chart is ideal to compare elements. You have the vertical and horizontal axis for the variables, the height and width of the column varies along those axes. An application of such a chart would be to compare the prices and volume of sales of car segments. For example, if the horizontal axis determines the price, and the height the sales volume, it is reasonable in the example image to look like a waterfall. Higher the price, fewer the sales.

Comparing through time

Comparison multiple line chart

  1. Multiple Line Charts: When you have a time dimension and many categories, the multiple line charts is ideal to compare the behavior of each category over time.
  2. Single Line Chart: When you have few periods and only one category a single line chart will be enough.
  3. Column Chart: Columns are ideal to compare values. The visual effect of one column against the other is proven to be enlightening. When you want to compare over time, with few categories, the column chart is the perfect visualization to choose.
  4. Circular Area Chart: Also called spider net chart, is ideal for few periods and cyclical data.

Distribution

The distribution chart selection decision tree branch first level divides between one variable or multiple variables.

Distribution column histogram

  1. Column Histogram: When the analysis is for a single variable, and the distribution is discrete (this means you can define buckets) a Column Histogram is the most appropriate and perfect visualization. A typical example of histograms would be to count the number of purchases per hour of the day. Each hour can be used as “bucket”, and the count of purchases the height of the histogram.
  2. Line Histogram: When the analysis requires a single variable, but its distribution is continuous, a line histogram is preferred.
  3. 3D Area Chart: When the distribution analysis requires the review of three variables, a 3D area chart is the chart of choice. The user can analyze the data within 3 axes and correlate the distribution as a piece of area, instead of just lines.
  4. Scatter Plot: For Two Variables distribution analysis the chart of choice is the scatter plot. The user can understand how the points groups together and infer if there is a correlation that becomes a pattern.

Composition

The composition chart selection decision tree divides into two main branches, the composition over time, and the composition static (a photo).

Composition Dynamic Over Time

Composition stacked column chart

  1. Stacked Column Chart: When the relative and absolute difference of the composition of data is meaningful, and the periods of time are few, a stacked column chart is suggested. An example would be how a cost structure of a product compose the price during four quarters.
  2. Stacked 100% Column Chart: Following the reasoning of the previous example, when the period is few, and only the relative difference is meaningful, a 100% stacked charts are suggested. An example would be how a quarterly budget is composed in a year.
  3. Stacked Area Chart: When the period is continuous, and the relative and absolute difference is meaningful, a stacked area chart is suggested. An example could be the variation of temperature in a room, and how much is contributed by air temperature and how much by soil temperature.
  4. Stacked 100% Area Chart: Following the same reasoning of the Stacked Column chart, the 100% stacked area chart is used when the analysis over time is continuous but only the relative difference is important.

Composition Static Over Time

When we talk about composition static in time, we are referring chart that shows a photo os certain moment in time, and express the components of a whole value.

Composition donut chart

  1. Donut Chart: This kind of charts (also including the Pie charts) express the different components of a variable. The whole circle represents the 100% of the variable. Each section represents a piece (percentage) of the whole value. A traditional example is market share.
  2. Waterfall Chart: Waterfall Charts show the increments and decrements from a starting value to its final value. A very popular example of this chart is the price of the stock and its variation during a trading day.
  3. Stacked 100% Column Chart With Subcomponents: This chart is very useful for drill down scenarios. The first column shows the subcomponents and drills down to the component of highest percentage. This repeats for the number of subcategories required.

Conclusion

The Chart Chooser From Abela’s Extreme Presentation Method is a useful tool for generating chart ideas and prototype different versions of the perfect visualization during data analysis. The cheat sheet is not a complete assessment and its intention is not to be thought of as a static reference, but to be considered an exploratory tools for trying different charts throughout your analysis.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.