The World Health Organization's new Data Description Schema provides the building blocks to harmonize and unify the diverse data handled by WHO while continuing to empower the use of bespoke tools and systems tailored to the data workflow at hand.
By consistently describing our data in the same way, the Data Description Schema generates powerful cross-platform, tool-agnostic opportunities for the whole data journey; from collection to analysis and dissemination; expressed in column titles, metadata, API queries, file-names and more.
The definitions and examples below, serve to demonstrate the consistency of meaning and structure that provide the foundation for powerful and dynamic applications of data.
The creation of a common data description schema ensures a greater degree of schematic data integrity, increasing interoperability across programmatic areas of work and in our interface with Member States, partners and public consumers of data. Expressed within our data products, users and consumers of data will, for the first time, have a common and consistent language through which to understand and leverage data..
Through the logical and machine-readable format of the Data Description Schema, we hope to increase the accessibility, usage and actionability of data. To that end, the semantic encoding, common syntactic structure and consistent query syntax provide clear information for human users and codeable metadata for systems and machines. This will directly address feedback from country stakeholders and pain points received through external research and interviews with users in 2021 and 2022.
The first version of the Data Description Schema is being implemented in BETA for the published data outputs of the World Health Data Hub and is also applied and tested across all data within WHO's Global Health Observatory and flagship Global Health Estimates before their next public release. This BETA phase provides the opportunity for refinement and testing by the Division of Data Analytics and Delivery for Impact (DDI) alongside a representative community of data managers from across WHO including regions and countries.
The 8 May 2022 statistical release of WHO's global estimates of excess mortality associated with COVID-19 leveraged the BETA Data Description Schema within the data set and is available for download.
As the Data Description Schema is adopted and implemented within WHO, we are laying a consistent, tracked foundation for applications of data and partnerships, both current and future.
With the launch of the public facing data hub, Datadot, planned for Nov 2022, the Data Description Schema will be adopted for all publicly released data beginning with the 1200+ health indicators within the Global Health Observatory for data exports/downloads and API. Beyond this, the full Data Description Schema will be released for public reference on Datadot.