Nowadays the modern workplace is pushing forward to the usage of more and more data in their decision making processes. With the mainstream of big data and business analytics, organizations no longer require the IT areas guidance to analyze data; tools exist to allow any employee to connect to shared data sources and make their analysis.
But being able to analyze data also requires the use of visualizations to discover insights out of the data and be able to socialize them with colleagues and managers. The importance of Self Service Business Intelligence is not only exploratory. The ability to share and socialize information is a must in the data analysis process.
Data Scientists exists for decades, even though it is a “vogue” term nowadays in every modern organization, the reality is that vast amounts of data always existed in fields like medicine, drug discovery, microbiological sciences, advanced finance, and biotech. The field is mature enough to produce a set of best practices that have to do on how data is shown to non-scientific audiences, in a way that the insights obtained are understood and can help to the persuasion of an idea.
A Three-level Decision Tree for Selecting the Perfect Visualization for Your Data
In this article, we are showing a simple cheat sheet called the Chart Suggestions—A Thought-Starter from The Extreme Presentation™ Method created by Dr. Andrew Abela.
Selecting the perfect visualization
The Abela’s Cheatsheet provides a general overview, which starts with a simple question: What would you like to show? The answer provides a decision tree to follow and determine the kind of visualization of the data you can use.
For the diagram, we are using Abela’s Chart Chooser PowerPoint Templates from SlideModel as it allows users to download the template and edit data-driven charts directly in PowerPoint.
The four main decision tree branches are based on the purpose of the perfect visualization:
On each node, the questions focus on the nature of the information (static or dynamic), the number of variables involved, and the series and categories. The following section deep dives in each branch of the tree and presents some examples.
When the purpose of the chart is to show the relationship between variables of data or to infer some relationship insight, the cheat sheet proposes two charts:
- For analysis of the relationship between two variables, the suggested visualization is a simple scatter plot. Relationships can be detected-through clustering techniques (grouping points that seam together, or by extrapolation, inference if the apparent behavior of points that generate a triangular shape as a line. An example usage would be the analysis of the relationship between height and weight of a person, also known as BMI Chart. Each cluster can be defined as a category of the person’s current health status.
- For three variables, a useful and perfect visualization is the bubble chart. In nature, it is similar to the scatter plot, but it also adds the diameter of each point as a sampling variable. An example of this chat would be the analysis of the relationship between product price, product sales, and the percentage of the total of the whole sales (that would be the diameter).
The comparison chart selection decision tree branch starts with the option of comparison of items vs. comparison over time.
Comparing among items
- Parallel Bar Charts: The bar charts are ideal for comparing elements with multiple items and different categories. An example would be to compare two car brands (category 1 and 2) and analyze the price of each segment (each bar represents a segment of luxury, family, etc.)
- Column charts: Ideal for a few items and categories, for example, referencing the previous example, the user can chart a specific segment, for both brands of cars. In these cases, the series is represented by the colors of the chart.
- Table with Embed Charts: For comparisons of many elements and many categories, the decision tree suggests the use of tables with embedded charts. Each chart can describe the behavior of different variables in the context of the two dimensions that define the table. Reasoning with the previous example of price comparison of car segments, within different brands, you can use a table where columns determine the region (we are adding a new dimension, geography), each row determines the segment. Each cell contains a bar chart with the price comparison of the two brands.
- Different Widths Columns charts: For Many categories and two variables, the variable width column chart is ideal for comparing elements. You have the vertical and horizontal axis for the variables, the height and width of the column varies along those axes. An application of such a chart would be to compare the prices and volume of sales of car segments. For example, if the horizontal axis determines the price and the height of the sales volume, it is reasonable in the example image to look like a waterfall: higher the price, the fewer sales.
Comparing through time
- Multiple Line Charts: When you have a time dimension and many categories, the multiple line charts are ideal for comparing the behavior of each category over time.
- Single Line Chart: When you have a few periods and only one category, a single line chart will be enough.
- Column Chart: Columns are ideal for comparing values. The visual effect of one column against the other is proven to be enlightening. When you want to compare over time, with a few categories, the column chart is the perfect visualization to choose from.
- Circular Area Chart: Also called spider net chart is ideal for a few periods and cyclical data.
The distribution chart selection decision tree branch first level divides between one variable or multiple variables.
- Column Histogram: When the analysis is for a single variable, and the distribution is discrete (this means you can define buckets), a Column Histogram is the most appropriate and perfect visualization. A typical example of histograms would be to count the number of purchases per hour of the day. Each hour can be used as a “bucket”, and the count of purchases the height of the histogram.
- Line Histogram: When the analysis requires a single variable, but its distribution is continuous, a line histogram is preferred.
- 3D Area Chart: When the distribution analysis requires a review of three variables, a 3D area chart is the chart of choice. The user can analyze the data within 3 axes and correlate the distribution as a piece of area, instead of just lines.
- Scatter Plot: Two Variables distribution analysis the chart of choice is the scatter plot. The user can understand how the points groups together and infer if there is a correlation that becomes a pattern.
The composition chart selection decision tree divides into two main branches, the composition over time, and the composition static (a photo).
Composition Dynamic Over Time
- Stacked Column Chart: When the relative and absolute difference of the composition of data is meaningful, and the periods of time are few, a stacked column chart is suggested. An example would be how a cost structure of a product composes the price for four quarters.
- Stacked 100% Column Chart: Following the reasoning of the previous example, when the period is few, and only the relative difference is meaningful, a 100% stacked charts are suggested. An example would be how a quarterly budget is composed in a year.
- Stacked Area Chart: When the period is continuous, and the relative and absolute difference is meaningful, a stacked area chart is suggested. An example could be the variation of temperature in a room, and how much is contributed by air temperature and how much by soil temperature.
- Stacked 100% Area Chart: Following the same reasoning of the Stacked Column chart, the 100% stacked area chart is used when the analysis overtime is continuous, but only the relative difference is important.
Composition Static Over Time
When we talk about composition static in time, we are referring a chart that shows a photo os a certain moment in time and expresses the components of a whole value.
- Donut Chart: This kind of chart (also including the Pie charts) express the different components of a variable. The whole circle represents 100% of the variable. Each section represents a piece (percentage) of the whole value. A traditional example is market share.
- Waterfall Chart: Waterfall Charts show the increments and decrements from a starting value to its final value. A very popular example of this chart is the price of the stock and its variation during a trading day.
- Stacked 100% Column Chart With Subcomponents: This chart is handy for drill-down scenarios. The first column shows the subcomponents and drills down to the component of the highest percentage. It repeats for the number of subcategories required.
The Chart Chooser From Abela’s Extreme Presentation Method is a useful tool for generating chart ideas and prototype different versions of the perfect visualization during data analysis. The cheat sheet is not a complete assessment, and its intention is not to be thought of as a static reference but to be considered an exploratory tool for trying different charts throughout your analysis.