This was an interesting one to do, further indulging my election data nerd leanings. The seat declaration dataset was made available here by the House of Commons Library with an accompanying report (note that I used the first published version, so any errors present in there will be here also). The tweet data was again collected using the Search on the Twitter REST API via Tweepy (a library which greatly simplifies using the Twitter API from Python). I used the same method in this post for the #dogsatpollingstations hashtag, and as I said there a neater, more automated way of doing it would still be good and is still in progress.
Simply counting tweets is a very simplistic way of analysing Twitter data and obviously much more detailed investigation would need to be done for any conclusions to be drawn, but as so much was made of how Labour won social media, it was interesting to see how there were consistently more tweets about Labour seats than Conservative ones through the night, despite there being more actual seats declared for the Conservatives. With the usual warnings about correlation and causation in mind, the spike in tweet numbers at around 4:50am was striking, and was very close to the point at which 90% of constituencies had declared according to the Commons Library report. Shortly before this point, the tweets about ‘hung parliament’ began to overtake those about ‘exit poll’ for the first time, suggesting that between 4am and 5am was when the realisation of what the result would actually be set in.
Plotting the tweets against the actual declaration times was intriguing, but the ‘high profile’ declarations didn’t really appear to correlate with spikes in tweets for the relevant party as I had thought they might. I couldn’t identify any patterns in the declaration time compared with the party, or the percentage vote share either, but plotting the constituencies by party in this way revealed some interesting information, such as that the highest percentage vote shares (> 70%) tended to be Labour, and in seats won by the Lib Dems and SNP, the share tended to be lower (< 50%).
Picking out some of the ‘extremes’ was informative too; Buckingham for example is the constituency of The Speaker of the House of Commons, who stands independent of any party, and is traditionally not opposed in the constituency by any of the main parties. Furthermore, the Speaker does not vote in Parliament except to break ties, meaning that the people of the constituency essentially do not get the same representation as the rest of the country. This may explain why this constituency also had the largest number of invalid votes (as was also the case in 2015), and it would be interesting to investigate whether the votes were deemed invalid as the electors had actively spoiled their ballots in protest, or were just confused as what their choices were.
The making of
The Commons Library data is provided as two tables and I wanted to combine them to get the time, vote share, and turnout in a single table. For my data fiddlings I’m using the virtual machine which is handily provided with TM351 (the Open University’s Data Analysis course) which comes with a PostgreSQL database already set up, but through the course I found it a bit fiddly to interact with from the Jupyter notebook environment. Since I know where I am with Microsoft SQL Server (and hey, I’m not being assessed here), I ended up using a bulk insert of the CSV files into two new tables, combing the two with a SQL join, and then copying the result to a new CSV, ready to import back into the notebook. The data didn’t require much cleaning, but I altered some constituency names to remove the commas as they were causing me CSV import headaches.
Once I had the data I spent the bulk of the time tweaking the code for the plot to make the subplots work properly together with the annotations in the right places, and learnt / relearnt plenty by revisting old TM351 notebooks and stumbling across StackOverflow answers. Rather than dissect it all here, I have once again put the notebook, combined CSV and SQL script on Github here.