Thursday, 27 March 2014

4 social media research challenges to overcome when tackling live debates

As we approach the final furlong of the race for the Scottish Independence referendum and rapidly approach another General Election, much excitable talk bubbles up once again about using social media as an election predictor; with the current fashion for presidential-style election debates, those are under the social media analysis spotlight too, with Twitter and other platforms providing a source of instant feedback and soundbites - cheaply or for free. Media organisations, research companies, political parties and casual observers alike all feast on instant statistics about who has "won". Needless to say, live debates provide a snapshot of how social media can give large-scale instant feedback - something which tickles the fancy of insight departments in companies and organisations the world over.

Last night's EU debate on LBC between Nigel Farage and Nick Clegg was a good canvas to show how there are significant challenges to such an approach. To demonstrate why, I set up a quick search for the hashtags #NickvNigel and #LBCdebate, using social media monitoring tool Brandwatch. Incidentally, this isn't a tirade against such tools, which do exactly what they're supposed to. Instead, it's a call to arms: to make this data meaningful, we need to think very carefully about the context of such data, to clean it appropriately, and to treat is with extreme caution. If we take necessary steps, which may involve cutting out substantial proportions of the data, we may be able to get meaningful results.

The Blurrt "worm"

The LBC website has a "worm", courtesy of Blurrt. Sadly at time of writing the LBC website was creaking and the worm wasn't visible at all during the debate itself. All that was visible was the phrase "The requested URL /graphs/sentiment/ was not found on this server." The bolded word leaves me sad, but not as sad as the "how it works" page, which gives no information whatsoever on the methodology and a lot of explanation of some basic sampling theory - dressed up in such a way as to make it look intimidating to a non-technical audience whilst still explaining nothing useful. There is certainly a place for real-time analysis (although as Francesco D'Orazio points out succinctly, "If you can’t make decisions in real time there is no point in using real-time intelligence"); that real-time analysis must inevitably depend largely (or solely) on technology. As an advertisement for robust social media analysis, however, this is flawed, flawed, flawed.

There are several challenges which we need to consider.

1. Using hashtags as search terms

As this was a casual exercise, I opted for simplicity in my search term, opting initially for #NickvNigel (simply because this was the one appearing on my own Twitter feed) and later adding #LBCdebate, which I only spotted once it was mentioned by Nick Ferrari 10 minutes into the debate itself - a good thing I did, as #LBCdebate turned out to be the dominant hashtag: 

This brings up one potential issue - retrospective data, which may not always be complete depending on how it's coming from Twitter. 

But there's a more fundamental problem. Almost by definition, the use of a hashtag implies prior knowledge of its existence, and generally also implies an affinity for the topic, and possibly good connections with others close to the topic. The casual LBC listener stumbling across the debate who chose to comment - very likely the unpartisan "floating voter" who we are so anxious to identify - will be unlikely to be found here. There are parallels in commercial social media research, too; do real people use hashtags like #danceponydance, or do they just talk about "the T-mobile ad"? (Hint: that's actually not a good example, as it's a rare occurrence of a campaign that has really taken off in social media. Much to my advertising research colleagues' frustration, not to mention that of my clients, the reality is that most campaigns barely get talked about at all.)

Should we go with the easy option, or try to look at all tweets from the period referring to Clegg or Farage? Had I done the latter, the results might have been very different.

2. Coding: far from trivial

I dived in and manually coded 199 tweets. Simple, right? Not at all. There are myriad ways of doing this. This was a quick-and-dirty exercise on my part, but it's worth jotting down some of my assumptions, because even a quick-and-dirty bit of coding can rapidly prove a head-scratcher. I'm not claiming this is the "right" way to go about things! On the contrary, there are probably approaches which are far better, and some of my assumptions are probably way off the mark. For example, I could have focussed purely on tweets which made reference to the debate performance itself ("Farage is winning", "Clegg sounds nervous", etc).

I started by taking a sample of tweets using either hashtag, between 1900 (the start of the debate) and 2100 (an hour after the finish). The time period is arbitrary. My code frame was very simple: "Clegg", "Farage" or "neither". Broadly speaking, I defined "Clegg" as any tweet saying either something good about Clegg or something bad about Farage, and "Farage" vice versa; "neither" was any comment which gave nothing away. Any retweet of an official party account I automatically set to being "for" that party (mercifully both Labour and Tory HQ seemed to be very quiet); retweets of mainstream news accounts, without added comment, I set to "neither" unless the tweet reported something obviously critical. This approach was pretty self-explanatory to begin with, but there were snags aplenty.

This tweet is clearly making a political point, but for which side?
How about this?
Or this?

(For reference, I coded those as "neither", "Clegg" and "Clegg" respectively, but I wouldn't quibble with anyone who coded them differently).

Other tweets, meanwhile, needed a good look at the context and/or embedded media/links to make an educated decision - this one is clearly pro-Farage:

3. Are opinions representative of Twitter? Of the wider population? Even of the tweeters talking about the issue?

Coding social media verbatim is tricky at the best of times and whether a manual, automated or machine-learning approach is taken, clearly needs a lot of thought. However, even if we assume an optimal coding strategy, there's a deeper-seated problem, and this comes back to the question which old-school market researchers always ask about social media data: But is it representative?

When asked that question, I generally fall back on a standard response: "Probably not...but does it matter?" There are so many unknowns, but survey respondents aren't exactly representative either ("yes, of course I'll spend 45 minutes for little or no reward answering questions about my mortgage provider")

The problem is not a question of demographic representivity, but more "to what extent do the views expressed on tweets represent the views on Twitter?" The first and most obvious point is that people only tweet about stuff they care about. Hence we'll have to stick with surveys for our mortgage provider research. Do the tweets represent the underlying opinions? Probably not - it's only the things that delight/outrage people the most that actually get posted. People don't necessarily offer up unprompted opinions unless they feel the need to broadcast them.

But studying political tweets is even more problematic.

4. Activists dominate proceedings

Of the 198 tweets I analysed, 153 gave some sort of opinion one way or another. I looked at the profiles of these 153 tweeters to see if I could find anything out about them. A Twitter profile gives you 160 characters to define yourself. After going through a few, it seemed to me that they could be divided into four categories:
  • Activist
  • Politician
  • Journalist
  • Other
I decided to code anyone as an "activist" whose profile showed an obvious leaning towards a particular political party or ideology. My reasoning was that anyone who uses up some or all of their 160 character bio to state their political leanings would be likely to be pretty dyed-in-the-wool. Some were a grey area: there were plenty who were self-described as "interested in politics" who I coded as "other", while anyone who said things like "socially liberal" or "Europhile" I placed in the "activist" bucket. "Politician" means anyone whose bio states that they are an MP, MEP, Councillor and so on; prospective candidates were problematic, although anyone who was borderline would end up in the "activist" category anyhow. "Journalists" were mostly self explanatory.

The breakdown of "opinionated" tweeters is as follows:
No less than 36% of the tweets were written (or retweeted) by tweeters were self-described as being politically polarised*, with another 3% being journalists.

Does that skew our sample? Of course it does - massively. There is a substantial minority of politically savvy, active cyberwarriors sticking up for their man. It's true of the #IndyRef debate as well. Never mind the demographic breakdown of Twitter - it's the propensity of people to tweet about what matters to them that is more important. The sample is biased away from casual listeners and floating voters, and towards a polarised, politically charged audience. Shortly before the debate began, Lib Dem Digital Communications lead Bess Mayhew sent out an email to supporters which said "LBC are running a “Twitter worm” which tracks who is winning the twitter battle. Nick needs your help to come out on top, so lets get tweeting!" In a world increasingly judged in this way, groups will always look for ways to game the system.

There's one further consideration to take into account which I've also not dealt with here - multiple tweets by the same person. As an example, Peter Chalinar (@TaleahPrince) tweeted nearly 200 times yesterday about the debate (mostly retweets of others) - mostly strongly in favour of Farage, whilst Lib Dem MEP Rebecca Taylor notched up nearly 150 tweets. While neither of them turned up in my sample of 198, there were several people whose tweets appeared twice. De-duplicating authors is another step in social media analysis which might want to be taken, depending on the objectives.

* Of course it could be argued that anyone tuning into an hour-long programme on a political issue that isn't even considered to be in the top 10 issues facing Britain today according to the Ipsos MORI issues index would be likely to be a bit of a politics nut anyhow. 

So what about the results?

What about them? Hopefully I've demonstrated that without some careful methodological thought, the results are pretty meaningless, and my own system was not thought through in detail - I simply wanted to point out some issues. For the record, the Blurrt worm seems to have done reasonably well at picking up sentiment expressed towards particular issues as the debate went on, and called it overall in favour of Farage, mirroring the snap Yougov poll taken immediately after the debate. My own results were rather different:

Topline figures
Clegg 44%
Farage 33%
Neither 23%

Ignoring the "neithers", this boils down to
Clegg 58%
Farage 42%

What about if we exclude politicians and activists from our sample? This reduces the sample of opinionated views from unpolarised people down to a rather meagre 94 (less than half of our original sample size)

As it turns out, and somewhat to my surprise, there was actually very little effect, with the results now amended to

Clegg 55%
Farage 42%

Perhaps implying that the cyberwhipping on both sides was equally effective.

How do I explain the discrepancy between my own results and the worm (and indeed the poll)? It's hard to say. There were a few hashtag "hijacks" - people talking about issues which came up in the debate which were not directly related to the EU; notable examples included Scottish independence and gay marriage, where there were several tweets critical of Farage - by my own rules I coded these as "wins" for Clegg but perhaps these could have been excluded from the sample or coded as "neither". There were several tweets reporting the Yougov poll result which I categorised as neutral as they were merely reporting the mainstream media outlet - I could have coded these as being for Farage, which would have boosted his score a few points. Other than that, there are so many variables that I find it difficult to pinpoint.

Perhaps Sky's primitive method was best?

Sky News opted for a simple approach - they posted a couple of tweets, one in favour of Farage, one for Clegg, and asked for retweets to endorse. This direct approach - closer to a traditional market research technique - might work better in such circumstances, and indeed this was in line with the poll (and the worm):

Where does this leave political social media analysis?

Overall, then, I believe there are multiple issues with political social media samples, although with appropriately thoughtful handling I do think these issues can be overcome. There is certainly a place for fast-turnaround or real-time analysis which presents significant challenges, although once again these are not insurmountable. Watch out for the next debate on the BBC, for which no doubt there will be more furious analysis and debate.