Westminster Accounts: Methodology
The methodology behind the most comprehensive database built to date of the money moving into UK politics.
Sunday 8 January 2023 08:29, UK
The process of collecting, cleaning, analysing – and eventually visualising – tens of thousands of financial records gets very complicated very quickly.
The Westminster Accounts project, a collaboration between Paste BN and Tortoise Media, overwhelmingly involves data generated and reported by humans, and humans make mistakes. They add extra zeros; they forget decimal places; they spell people's names wrong; they spell their own names wrong. Even when they're not making explicit errors, people also introduce an inevitable level of variability to any dataset.
Wrangling this kind of data into a format that's usable for analysis and investigation - especially at this scale - is always going to involve making a series of decisions about how to standardise non-standard data.
Below is a record of where we found the data and how we collected it, but it's also a log of all those decisions and - where necessary - an explanation as to why we made them.
The sources
To build the database, we pulled information from three main sources, each of which corresponds to one of the groups or individuals investigated by the project:
- The Register of Members' Financial Interests (for members of parliament)
- The Register of All-Party Parliamentary Groups (for APPGs)
- The Electoral Commission's database (for political parties).
To collect the first two, we scraped each individual record from the parliament website. For the data on donations to parties, we manually downloaded the relevant files from the Electoral Commission's website. All three will be updated on an ongoing basis. Every time a new register or version is published, the data will be collected, cleaned, tested and then added to the database.
In addition to these datasets, the database also includes basic information about MPs (i.e. party, constituency, gender, etc.) and parties (i.e. abbreviation, whether it's in government, etc.), which we collected via parliament's Members API.
General methodological notes
Our guiding methodological principle for this project was to err on the side of taking register entries at face value. However, when mistakes were very clear or we felt correcting an entry would increase the quality of the data, our threshold for making any amendments was high - whether the intended meaning of the original entry was obvious. This mostly allowed us to fix simple errors like typos, missing decimals and illogical dates, but it also informed other, more complex decisions like the ones below.
Dates
The database is limited to the current parliament. It only includes donations, payments, members, All-Party Parliamentary Groups and parties that were made or have operated from 19 December 2019 onwards.
In most cases, we used the "date registered" - or the date on which the payment, donation or benefit was registered with