We’ve tried hard, but sadly we are not able to bring back our Twitter data tools.
Simply put, this is because Twitter have no route to market to sell low volume data for spreadsheet-style individual use.
There’s lots of confusion in the market about the exact rules, and why they happen. This blog post tries to explain them clearly!
How can you get Twitter data?
There are four broad ways.
1. Developers can use the API to get data for their own use. The rate limits are actually quite generous, much better than, say, Linked In. It’s an easy and powerful API to use.
There are two problems with this route – firstly to developers it sets the expectation that you can do whatever the API allows. You can’t, in practice you have to follow one of the routes below, or Twitter will shut down your application.
Secondly, it is unfair to non-programmers, who can’t get access to data which programmers easily can. More on that in the “why” section below.
2. Software companies can make an application which use the developer API.
As soon as it gets serious, they should join the Twitter Certified Program to make sure Twitter approve of the app. Ultimately, only Twitter can say whether or not their T&Cs are being met.
These applications can’t allow general data analysis and coding by their users – they have to have specific canned dashboards and queries. This doesn’t meet ScraperWiki’s original vision of empowering people to understand and work with data how they like.
Datasift is a fantastic product, which indexes the data for you and provides lots of other social media data. Gnip is now owned by Twitter, and is still in the process of blending into them – they’re based in Colorado, rather than San Francisco.
Both companies have to get the main part of Twitter to vet your exact use case. Your business has to be worth at least $3000 / month to them to make this worthwhile.
The actual cost of roughly 10 cents / 1000 Tweets is not too bad, lots of our customers could pay that. But few have the need to get 30 million Tweets a month! In lots of ways, this option is too powerful for most people.
These show that it is worth talking to and lobbying Twitter for new ways to get data.
Why do Twitter restrict data use?
The obvious, and I think incorrect, answer is “for commercial reasons”. These are the real reasons.
1. Protect privacy and stop malicious uses.
If you use the firehose via, say, Datasift you have to delete your copy of a Tweet as soon as a user deletes it. Similar rules apply if, for example, somebody makes their account private, or deletes their account. This is really really impressive – fantastic for user privacy. Part of the reason Twitter are so careful about vetting uses is to make sure this is followed.
Twitter also prevent uses which might harm their users in other ways. I don’t know any details, but I understand that they stop Governments gaining large volumes of Twitter data which could be used to do things like identify anonymous accounts by looking at patterns. I’m guessing this has come from Twitter’s rise to prominence during various ‘Twitter Revolutions‘, such as in Iran in 2009.
2. They’re a media company now.
Twitter has changed from its early days, it is now a media company, rather than a messaging bus. For example, the front page of their developer site is about Twitter Cards and embedding Tweets, with no mention of the data features. This means their focus is on a good consumer experience, and advertising, not finding new routes to market for data.
3. They’re missing bits of the market.
Every company can’t cover and do everything its ecosystem might want. In this case, we think Twitter are simply missing a chunk of the market, and could get more revenue from it.
While there are plenty of products letting you analyse Twitter data in specific ways, there is nothing if you want to use Excel, or other desktop tools like Tableau or Gephi.
For example, Tableau are partnered with Datasift, which from the outside might make it look like Tableau users are covered. Unfortunately, customers still have to have their use case vetted, and be prepared to spend at least $3000 / month. Also, the Tweets are metered rather than limited, making it awkward for junior staff to freely make use of the capability. It’s just too powerful and expensive for many use cases.
The kind of users in this “missing middle” don’t want to learn a new, limited data analysis interface. They want to use the simple, desktop data analysis products that we’re already familiar with. They also just want a file – they know how to keep track of files.
The ScraperWiki platform continues without Twitter data. You can accurately extract tables from PDFs and much more.
We know a lot about Twitter data, and have contacts with lots of parts of the ecosystem. If you have a high value use of the data, our professional services division are happy to help.