Perfect visualisation for your data: Nowadays the modern workplace is pushing forward to the usage of more and more data in their decision making processes. With the mainstream of big data and business analytics organizations no longer require the IT areas guidance to analyze data; tools exists to allow any employee connect to shared data sources and make their own analysis.
But being able to analyze data also requires the usage of visualisations to discover insights out of the data and be able to socialize them with colleagues and managers. The importance of Self Service Business Intelligence is not only exploratory. The ability to share and socialize information is a must in the data analysis process.
Data Scientists exists from decades, even though is a “vogue” term nowadays in every modern organization, the reality is that huge amounts of data always existed in fields like medicine, drug discovery, microbiological sciences, advanced finance and biotech. The field is mature enough to produce a set of best practices that have to do on how data is shown to non-scientific audiences, in a way that the insights obtained are understood and can help to the persuasion of an idea.
In this article we are showing a simple cheat sheet called the Chart Suggestions—A Thought-Starter from The Extreme Presentation™ Method created by Dr. Andrew Abela.
Selecting the perfect visualisation
The Abela’s Cheat sheet provides a general overview which starts with a simple question: What would you like to show? The answer provides a decision tree to follow and determine the kind of the visualization you can use.
For the diagram we are using Abela’s Chart Chooser PowerPoint Templates from SlideModel as it allows users to download the template and edit data driven charts directly in PowerPoint.
The four main decision tree branches are based on the purpose of the perfect visualisation:
On each node, the questions focus on the nature of the information (static or dynamic) , the number of variables involved and the series and categories. The following section deep dive in each branch of the tree, and presents some examples.
When the purpose of the chart is to show the relationship between variables of data, or to infer some relationship insight, the cheat sheet proposes two charts:
- For analysis of relationship of two variables, the suggested visualisation is a simple scatterplot. Relationships can be detected through clustering techniques (grouping points that seam together, or by extrapolation, inference if the apparent behaviour of points that generate a geometrical shape as a line. An example usage would be the analysis of the relationship of height and weight of a person, also known as BMI Chart. Each cluster can be defined as a category of the person’s current health status.
- For three variables a useful and perfect visualisation is he bubble chart. In nature is similar to the scatter plot, but it also adds the diameter of each point as a sampling variable. An example of this chat would be the analysis of the relationship of product price, product sales and the percentage of total of the whole sales (that would be the diameter).
The comparison chart selection decision tree branch start with the option of comparison of items vs comparison over time.
Comparing among items
- Parallel Bar Charts: The bar charts are ideal to compare elements with multiple items and different categories. An example would be to compare two car brands (category 1 and 2) and analyze the price of each segment (each bar represents a segment like luxury, family, etc.)
- Column charts: Ideal for few items and categories, for example, referencing the previous example, the user can chart a specific segment, for both brands of cars. In this cases the series are represented by the colors of the chart.
- Table with Embed Charts: For comparisons of many elements and many categories the decision tree suggest the use of tables with embedded charts. Each chart can describe the behaviour of different variables in the context of the two dimensions that define the table. Reasoning with the previous example of price comparison of car segments, within different brands, you can use a table where columns determine the region (we are adding a new dimension, geography), each row determines the segment, and each cell contains a bar chart with the price comparison of the two brands.
- Different Widths Columns charts: For Many categories and two variables, the variable width column chart is ideal to compare elements. You have the vertical and horizontal axis for the variables, the height and width of the column varies along those axes. An application of such a chart would be to compare the prices and volume of sales of car segments. For example if the horizontal axis determines the price, and the height the sales volume, it is reasonable in the example image to look like a waterfall. Higher the price, fewer the sales.
Comparing through time
- Multiple Line Charts: When you have a time dimension and many categories, the multiple line chart is ideal to compare the behaviour of each category over time.
- Single Line Chart: When you have few periods and only one category a single line chart will be enough.
- Column Chart: Columns are ideal to compare values. The visual effect of one column against the other is proven to be enlightening. When you want to compare over time, with few categories, the column chart is the perfect visualisation to choose.
- Circular Area Chart: Also called spider net chart, is ideal for few periods and cyclical data.
The distribution chart selection decision tree branch first level divides between one variable or multiple variables.
- Column Histogram: When the analysis is for a single variable, and the distribution is discrete (this means you can define buckets) a Column Histogram is the most appropriate and perfect visualisation. A typical example of histograms would be to count the amount of purchases per hour of the day. Each hour can be used as “bucket”, and the count of purchases the height of the histogram.
- Line Histogram: When the analysis requires a single variable, but its distribution is continuous, a line histogram is preferred.
- 3D Area Chart: When the distribution analysis requires the review of three variables, a 3D area chart is the chart of choice. The user can analyze the data within 3 axes and correlate the distribution as a piece of area, instead of just lines.
- Scatter Plot: For Two Variables distribution analysis the chart of choice is the scatter plot. The user can understand how the points groups together and infer if there is a correlation that becomes a pattern.
The composition chart selection decision tree divides into two main branches, the composition over time, and the composition static (a photo).
Composition Dynamic Over Time
- Stacked Column Chart:When the relative and absolute difference of the composition of data is meaningful, and the periods of time are few, a stacked column chart is suggested. An example would be how a cost structure of a product compose the price during four quarters.
- Stacked 100% Column Chart: Following the reasoning of the previous example, when the period are few, and only the relative difference is meaningful, a 100% stacked charts is suggested. An example would be how a quarterly budget is composed in a year.
- Stacked Area Chart: When the period is continuous, and the relative and absolute difference is meaningful, a stacked area chart is suggested. An example could be the variation of temperature in a room, and how much is contributed by air temperature and how much by soil temperature.
- Stacked 100% Area Chart: Following the same reasoning of the Stacked Column chart, the 100% stacked area chart is used when the analysis over time is continuous but only the relative difference is important.
Composition Static Over Time
When we talk about composition static in time, we are referring chart that show a photo os certain moment in time, and express the components of a whole value.
- Donut Chart: This kind of charts (also including the Pie charts) express the different components of a variable. The whole circle represents the 100% of the variable. Each section represent a piece (percentage) of the whole value. A traditional example, is market share.
- Waterfall Chart: Waterfall Charts show the increments and decrements from a starting value to its final value. A very popular example of this chart is the price of stock, and its variation during a trading day.
- Stacked 100% Column Chart With Subcomponents: This chart is very useful for drill down scenarios. The first column shows the sub components, and drills down to the component of highest percentage. This repeats for the number of subcategories required.
The Chart Chooser From Abela’s Extreme Presentation Method is a useful tool for generating chart ideas and prototype different versions of the perfect visualisation during data analysis. The cheat sheet is not a complete assessment and its intention is not to be thought as a static reference, but to be considered an exploratory tools for trying different charts throughout your analysis.