World Development Indicators – Tableau Dashboard

This is a dashboard that I’ve put together as a basic visualization exercise. The aim of the visualization was to provide simple visualisations of key health and economic trends. The indicators that I chose to focus on were GDP in 2017, Average Life Expectancy for men and women between the years 1960 and 2017, and the top 10 countries on average for Healthcare expenditure since the year 2000. The data was obtained from the ‘World Indicators’ dataset, which was used in Week Ten of the #MakeoverMonday Tableau challenge from 2019: https://www.makeovermonday.co.usck/data/data-sets-2019/

As a first attempt at creating a dashboard, the attempt was mixed. The charts in general could probably display information more clearly. Though the dashboard on Tableau Public is somewhat clearer than the image below, nonetheless, there are issues with the ease of use to work on in the future. The full dashboard, including individual sheets, can be found here: https://public.tableau.com/profile/scott.davies#!/vizhome/SummaryofKeyHealthandEconomicMeasures/Dashboard1?publish=yes



There are some interesting insights that can be obtained from this dashboard. Firstly, the trend of life expectancy increasing for both men and women over the last several decades is clear. For both men and women, life expectancy has increased by several years, from a starting point of below 60 years on average worldwide to around 70 years old as of 2017. A further data visualization and analysis could break these statistics down further, by region or compared to national GDP.
The top 10 countries by average health expenditure showed a surprising result. These countries were primarily made up of small island nations such as Nauru, Kiribati and the Marshall Islands. However, nations with a very high overall GDP such as Germany were also present in the top 10. The factors behind this are worth further consideration and would make an ideal topic for a further analytical project. While the dataset being utilized has a sufficient number of measures and dimensions for analysis, the data was not without issues. There were notable gaps in the data, with some measures and some years having considerable gaps in the data. This limited the range of analysis that could be conducted. In addition, while the measures were mostly self-explanatory, there was no data dictionary to go with the dataset, which made analysis somewhat more difficult. Overall, working with this dataset was a worthwhile exercise as a new practitioner of Tableau. My visualization skills were sharpened, as were my analytical skills of looking at a dataset and exploring it for insights to present.

The Importance of Data Cleaning and Preparation

For the first of my ‘portfolio’ posts, I am going to discuss one of the major stumbling blocks that I, and many others starting out in fields such as business intelligence and data science, have come across. Data cleaning and preparation is among the most important parts of the project lifecycle for any business intelligence and data science project. It is estimated that 80% of the work of people working in these fields relates to data cleaning and preparation in some way. Unfortunately, it’s often overlooked in university programs, online courses and in learning materials in general, despite its obvious importance.

Often, introductory courses will look at more exciting parts of business intelligence and data science, such as data visualization and machine learning. To an extent, this is understandable. These topics are useful ‘hooks’ to get beginners started on interesting and engaging tasks. However, without learning how to clean and prepare data, thoroughly understanding and being able to work through all the stages of a project is not feasible. Insufficient data cleaning and preparation will also compromise the final results obtained. As the saying goes, ‘Garbage In – Garbage Out’.

In this section of the article, I will go through some general principles and best practices for data cleaning and preparation. While there are of course many more techniques and advanced concepts within this area, they are beyond the scope of this article. I intend for this to be a starting point people, who like myself, are new to fields related to data and who want to get an idea of how to clean and prepare data.

When a dataset is obtained, the first thing to do is an exploratory analysis of it. In this stage, you should get a feel for the data within it. One of the first things to look for when doing the exploratory analysis is to make sure that the entries are valid. For example, do the fields that require a number have a numerical entry? On a similar note, entries should also make sense within the dataset provided. This will require a bit of domain knowledge of the subject of the data. For example, if looking at a dataset of wages, do the amounts make sense? If the average value within the dataset is, say, $100,000, and there is an entry that is $1,000,000, there’s a good chance this is an incorrect entry. However, this is all dependent on the context of the dataset.

Duplicate and null entries are also a priority to check for during this stage. Particularly with larger datasets, these entries are likely to arise at some point. They can often be overlooked as they are not always as obvious to find, particularly at an initial glance of a dataset.

Wikipedia provides a useful summary of the dimensions of data quality. They are as follows:

  • Validity (Do measures conform to defined rules or constraints?)
  • Accuracy (Do measures conform to a standardized value?)
  • Completeness (Are all the required measures known?)
  • Consistency (Are the recorded measures the same across the dataset?)
  • Uniformity (Does the dataset use the same units of measurement?)

 

An awareness of these factors of data quality and some preliminary work to ensure these are adhered to in the preparation and cleaning stages can vastly improve the final results of a project, as well as save a lot of time avoiding confusion and errors in later stages of a project. It can take some time to become accustomed to doing this and can be tedious at times, but establishing good practices of data cleaning and preparation is one of the most valuable things any beginner to business intelligence and data science can do.