The technical challenges in building a data portal

We recently designed and built a data portal which collects disaggregated data from multiple sources from 40 countries for Leonard Cheshire and The Department for International Development.This data is presented on the portal in a series of visualizations across 16 development indicators, grouped into the four themes of inclusive education, economic empowerment, technology/innovation and stigma/discrimination.

The portal presented us with a few technical challenges which we thought we would share.

Data import & storage

Leonard Cheshire needed to be able to update data easily and by themselves on a regular basis. The nature of the data was also very hierarchical so from a data perspective and included 3 levels of information::

Country - which includes multiple indicators
Indicator - which includes multiple disaggregates
Disaggregate

We used a CSV format which included one CSV for each of the above levels to make it easy to update only the data needed. As the data is imported on upload, these hierarchical relationships are re-created and then cached in the website to make page requests to view the data as efficient as possible.

Data wrangling & empty values

An interesting issue appeared regarding 'missing data' – how should missing numerical data be represented on the website? Initially, these cases were shown as '0'. Yet we realised this was not a good representation of the information – as '0' could be interpreted as 'we gathered the data and it was zero' as opposed to 'we did not gather the data, no data was available'. Given one of the key purposes of the site was to highlight gaps in the availability of disability data – this was a particularly a key distinction to make clear on the website. Thus these empty values were represented as 'N/A'. helping to highlight key areas & countries with missing data.

In the front end of the site, N/A was translated as ‘No Data’ and this message was placed directly on the charts where there would have been a percentage value (and corresponding coloured bar)

Dynamic Data Display

In web development (especially when using a CMS) it is very common to use HTML templates to render content, with content hierarchies and structures reflected in the template. As an analogy, the template is like a mould, a pre-existing structure where different aspects of the data 'flow' into different and specific areas within it.

For this dataset there was a high degree of variability in the data available between countries. Out of the four indicator themes, countries could have all four themes represented or just the one theme. Similarly, a particular indicator for one country might have many disaggregates, and for another country, none at all. To address these and other variations, the final templates ended up being minimal, so with a particular data/page view its structure was almost entirely determined by the data itself, rather than a template.

The display layer - what visitors see

The data disability portal is very graph heavy, with some pages showing 10, or more, bar charts. When developing this site, our main priorities were accessibility and performance.

When developing graphs for websites, the first instinct is to reach for a JavaScript library to display the data and, in some cases, we have done this. However, the main bar charts are created entirely using CSS, with the JavaScript charts only being loaded when and where required.

Avoiding the use of a JavaScript library for the majority of charts instantly makes the data more accessible. The figures and graph headings are embedded in the HTML, making it easier for screen readers and other assistive technologies to access the data. This also has a speed and performance benefit, as the browser is aware of what it needs to create as it loads the page - it doesn’t need to revisit or redraw any elements once everything has been displayed.

There are some graphs which are unable to be recreated using CSS alone and radar charts are a perfect example of this. When necessary, we have used Highcharts JavaScript charting library to display data, who offer a free licence for non-profits. We have used radar charts to enhance the site and allow the user to consume the data in a different format. Those browsing the website without JavaScript enabled would not be missing out on accessing the data - we use radar charts as an alternative to the bar charts.

Another example of using native HTML and CSS is the world map on the homepage. This, again, could have been created dynamically using a library. However, we chose to embed the map as an SVG in the HTML. This carries not only the accessibility benefits but allows the map code to be cached and compressed. The map metadata (the colours and statistics in the popup) are all dynamically built using the lists below. This creates a central source of truth - when the lists update the map updates along with it.

Because of the dynamic nature of the map and the slightly more complex interactions, we decided to hide it for both users without JavaScript enabled and those on a smaller screen. Once again, however, the user is not missing out on accessing any information as the lists are still present.

While developing the data portal, ensuring the data and content were accessible was at the forefront of our minds. The website features a “skip to main content” link (which is accessible with the keyboard) which will jump the user down to the main content, along with a “back to top” link in the footer. We also regularly ran the website through audit tools such as Google’s Lighthouse tool to ensure we were adhering to the rules set out by them.

In summary

The biggest challenge to this project (apart from a challenging timeline) was creating a robust site which much complexity, while still providing the client with a way of self-serving - being able to add their own data, having it validated for errors and published without our involvement.

By using a CSV format, the site editors can easily add new countries or indicators by simply adding to a spreadsheet and uploading this sheet to the Content Management System. The downside of this is that errors in the spreadsheet will be published on the site. We tackled this by adding in lots of validation and error checking during the data processing stage. Indicator text formatting and checking the relationship between the various spreadsheets made sure that if errors were in the sheets, the new information would not be published. An email would be sent to the editors outlining exactly what has failed.

View the finished project here

This article was posted in Design, Development, User Experience