How to analyse your customers social profile in 24 hours (Part I – assumptions and data collection) → Big Data Partnership → Unlock Value from Complex Data

How to analyse your customers social profile in 24 hours (Part I – assumptions and data collection)

Social profiles tell us a lot about the interest of its owner and also about people/organisations they follow and people who are following them. This blog post is a summary of what information you can get by collecting and analysing your customer profiles in 24 hours. In fact, after unlocking of the data, this process can be automated and can produce real-time analysis/prediction giving invaluable insights into human behavior from multiple aspects. I’ll give an example of this analysis using Twitter, and highlight what information gathering is possible and what can be done with that data.

In our example, we are interested in analysing the customer profiles of RSPB (Royal Society for the Protection of Birds). There are two interesting (and highly related) aspects that we want to capture. First, what are the topics that are somehow related to the organisation and the people who are actively contributing to these topics. We can then use this data to discover related trends and topics and understand what users are engaged in. Secondly, we want to know about the actual followers of the organisation, as the profiles of the followers may also tell a lot about the organisation and what interests these users.

Data collection is the first step in this process.  From the topic side we can simply identify a number of search terms that are related to the organisation and obtain the number of hashtags that occur in the results of this search, we then follow up each lead iteratively to a certain depth (after 2-3 steps the hashtags became unrelated). We gather all of the data that Twitter allows us to pull back in time and start to continually crawl these terms for a period of time. The topics returned include a variety of things, more general topics like #nature, #wildlife, #birds and more specific ones to the organisation (for example #rspbsteppingup). This process of selection can be updated periodically to capture trending topics.

We can also start to collect data from the followers of the organisation. This is a bit tricky as Twitter restricts the number of requests for its REST API, so the user profiles cannot be collected directly (especially in this case, when we have around 40K user profiles to crawl). One solution we have found to bypass this problem is to use the search API to collect the profiles.

After running the two crawlers overnight, we get

from the RSPB account followers

  • 27605 user profile
  • 341k tweets

from the related topics and tags (80 tags and search terms were identified)

  • 15727 users
  • 50k related tweets

It is interesting to note that the number of tweets collected from the topics rather is much less than followers, but it also contains less noise and therefore is more related to the target organisation. The tweets collected from the followers contains a wide variety of topics, only a small proportion of which are related to the organisation. This allows us to capture user interest in a more general sense, but is still loosely related to the target organisation.

(Click the link here for Part II which includes graphics showing the exciting results).

Posted on June 1, 2012 in Apache Hadoop, Blog, Business, Technology

Share the Story

Leave a reply

Back to Top