We recently designed and built a data portal which collects disaggregated data from multiple sources from 40 countries for Leonard Cheshire and The Department for International Development. This data is presented on the portal in a series of visualizations across 16 development indicators, grouped into the four themes of inclusive education, economic empowerment, technology/innovation and stigma/discrimination.
The portal presented us with a few technical challenges which we thought we would share.
Data import & storage
Leonard Cheshire needed to be able to update data easily and by themselves on a regular basis. The nature of the data was also very hierarchical so from a data perspective and included 3 levels of information::
- Country - which includes multiple indicators
- Indicator - which includes multiple disaggregates
We used a CSV format which included one CSV for each of the above levels to make it easy to update only the data needed. As the data is imported on upload, these hierarchical relationships are re-created and then cached in the website to make page requests to view the data as efficient as possible.
Data wrangling & empty values
An interesting issue appeared regarding 'missing data' – how should missing numerical data be represented on the website? Initially, these cases were shown as '0'. Yet we realised this was not a good representation of the information – as '0' could be interpreted as 'we gathered the data and it was zero' as opposed to 'we did not gather the data, no data was available'. Given one of the key purposes of the site was to highlight gaps in the availability of disability data – this was a particularly a key distinction to make clear on the website. Thus these empty values were represented as 'N/A'. helping to highlight key areas & countries with missing data.
In the front end of the site, N/A was translated as ‘No Data’ and this message was placed directly on the charts where there would have been a percentage value (and corresponding coloured bar)
Dynamic Data Display
In web development (especially when using a CMS) it is very common to use HTML templates to render content, with content hierarchies and structures reflected in the template. As an analogy, the template is like a mould, a pre-existing structure where different aspects of the data 'flow' into different and specific areas within it.
For this dataset there was a high degree of variability in the data available between countries. Out of the four indicator themes, countries could have all four themes represented or just the one theme. Similarly, a particular indicator for one country might have many disaggregates, and for another country, none at all. To address these and other variations, the final templates ended up being minimal, so with a particular data/page view its structure was almost entirely determined by the data itself, rather than a template.
The display layer - what visitors see
The data disability portal is very graph heavy, with some pages showing 10, or more, bar charts. When developing this site, our main priorities were accessibility and performance.
Another example of using native HTML and CSS is the world map on the homepage. This, again, could have been created dynamically using a library. However, we chose to embed the map as an SVG in the HTML. This carries not only the accessibility benefits but allows the map code to be cached and compressed. The map metadata (the colours and statistics in the popup) are all dynamically built using the lists below. This creates a central source of truth - when the lists update the map updates along with it.
While developing the data portal, ensuring the data and content were accessible was at the forefront of our minds. The website features a “skip to main content” link (which is accessible with the keyboard) which will jump the user down to the main content, along with a “back to top” link in the footer. We also regularly ran the website through audit tools such as Google’s Lighthouse tool to ensure we were adhering to the rules set out by them.
The biggest challenge to this project (apart from a challenging timeline) was creating a robust site which much complexity, while still providing the client with a way of self-serving - being able to add their own data, having it validated for errors and published without our involvement.
By using a CSV format, the site editors can easily add new countries or indicators by simply adding to a spreadsheet and uploading this sheet to the Content Management System. The downside of this is that errors in the spreadsheet will be published on the site. We tackled this by adding in lots of validation and error checking during the data processing stage. Indicator text formatting and checking the relationship between the various spreadsheets made sure that if errors were in the sheets, the new information would not be published. An email would be sent to the editors outlining exactly what has failed.