Back in 2011, in his book “Everything is Obvious”, Duncan Watts was musing on the tremendous potential to be tapped in the volumes of data that the internet is now giving us about human interactions. Comparing it to the discovery of the telescope, which revolutionised astronomy, he enthused that big data from our collective online activities gives us a powerful new lens to understand how we behave.
This theme was echoed by all three of the speakers in the session on Learning about People and Society via Analysis of Large-Scale Data on Human Activities at the AAAS meeting that I attended in February. Here I’ll focus solely on some of the work of Jure Leskovec, Assistant Professor at Stanford University. Of particular interest to me was what Leskovec’s studies have taught us about the lifecycle of an individual within an online community – including predicting how long they’re likely to remain a member of it. From a community management perspective, this work could prompt more data-driven ways of monitoring and facilitating engagement online – something I’ll return to later.
Mmmmm – beer!
In their research, Leskovec and colleagues decided to focus on language use. Language is crucial for self-expression, and for creating and reinforcing a sense of group identity. Sociologists and psychologists had already learned plenty about how language use evolves in the offline environment – but what happens online?
Leskovec’s study focused on two large online communities, RateBeer and BeerAdvocate, which are both over a decade old and therefore provide the ability to study multiple generations of users of the sites. Futhermore, the sites have many active users – almost 5000 have contributed at least 50 posts each – giving a large amount of data to analyse for variations in language use.
The first things the researchers looked at were the trends in language use amongst individual community members and by the community as a whole. Individual users each follow a linguistic pattern whereby they initially use more personal pronouns and their own word preferences when submitting reviews, but over time they drop the self-referencing and adopt more beer-specific vocabulary.
The community’s linguistic norms evolve over time too. For example, if you’re a beer drinker, in 2003 you’d be referring to a beer’s aroma whereas by 2005 you’d be calling it smell. Another noticeable trend was that reviewers on the sites were found to use more “fruit” terms when describing their beverages as the years passed.
Linguistics – and last orders
Is there any relationship between an individual’s choice of language and that favoured by the community as a whole? And if so, does this relationship change over time?
By taking snapshots of language use across the different months in the study, the researchers were able to calculate how distinctive the language used by an individual was compared to the language used by the rest of the community at that point. They could then compare this to the stage the user was at in her lifecycle – as determined from the total number of posts she had made before leaving the community.
If you imagine the effects of alcohol on your social interactions on a Friday night in the pub, they’re not all that dissimilar to the changes in language use by individual community members of the review sites. You start off shy and a bit distant from the people you’re with, then warm and adapt to your companions and finally, if you’ve had a bit too much to drink, become less coherent before parting ways at the end of the evening. So, too, in the online environment, a user will become increasingly receptive to the linguistic norms of the community for the first third of her time within it, but after a point of being maximally “in sync”, the gap between her language use and that of the community will widen again until she eventually leaves.
It’s not you – it’s me
So, what’s the reason for the growing gap in language between an individual and other users of the site towards the end? Given that the researchers had already observed that the community’s use of language continually evolved, the key question to address was whether a reviewer moved away from the core community by starting to use a language that was foreign to it, or because she fails to adapt to the changing word use within the community.
This can be determined by comparing a user’s current language use to their previous language use. For these sites, users’ “last orders” came because they had stopped adapting their language at later stages in their lifecycle – they’d got stuck in their linguistic ways while the community continued to evolve away from them.
Calling time at the bar
The user lifecycle of coming into sync with the community and then becoming linguistically distant from it follows the same pattern for everyone – and it correlates with how far through their own user lifecycle they are, rather than occurring after a fixed number of posts. Furthermore, the lifecycle analysis showed that users quickly learned to pick up new words once they’d been introduced into the community – lexical innovation – but that this decreased over time. Those users that were most adaptive initially, were also the one that were likely to contribute more posts.
These observations raised the intriguing possibility that a user’s exit from the site could be predicted based on their initial few posts. Leskovec and colleagues combined the various measures developed in the study – including first person pronoun use, lexical innovation, and consistency of language use – to show that it was possible to estimate whether a user would leave the site within a certain number of posts after their first 20 or 40. This gave a significant improvement over existing methods used to estimate churn rate based on activity levels alone.
Implications for community management
Imagine if, as a community manager of an online community, you could track the type of linguistic changes described in this paper. Perhaps you could identify when your current contributors are likely to lose interest in the community and whether you could take steps to compensate in advance.
Maybe further studies could identify what happens to users once they leave one site – do they start completely afresh on another and repeat the same lifecycle there? Or does their behaviour somehow evolve as they move through a number of online communities? Does being a member of multiple online communities at the same time affect the rate at which your language use evolves or whether there’s cross-fertilisation of terminology between online communities? Is it possible to identify users as being culturally bi- or even multi-lingual online?
I find it heartening to see an example of a data-driven approach to understanding online communities, and it underlines once again that community managers need to have a range of tools in their toolkit. It’s important to make new analytics methods and an awareness of relevant research in the social sciences part of this toolkit— doing so makes a lot of sense for a role that depends so much on good communication between people.
No Country for Old Members: User Lifecycle and Linguistic Changes in Online Communities – Danescu-Niculescu-Mizil et al., Proceedings of the WWW, 2013. Proceedings of WWW, 2013.