Saturday, November 27, 2010

Web Scraping and Copyright



Technology develops faster than the law does.   This can have both good and bad results.  Take the practice of web-scraping, for instance.   The same type of software used to detail the content of your site for search engine results can also be used to rob you of your innovative content.   Instead of just syndicating the content to provide it for search results, the software can also ‘scrape up’ the content to be used somewhere else.

The logical legal tool to deal with web-scraping abuse is copyright.  
However, Sensis recently discovered to its disgust that copyright won’t always protect content where it doesn’t meet the Federal Court’s benchmarks for originality.   IP lawyers throughout Australia aren’t convinced that this was the right decision – it doesn’t protect the efforts of database developers and other curators of information.

In the meantime, this means that contractual provisions become all-important if you are providing data feeds like product lists, timetables, or any database-style lists to other parties.   Your content-sharing agreements should not only take account of how your content is used, but also how it might be misused, and what measures the parties should take to stop that misuse.   For instance, let’s say you let me have access to one of your data feeds for the purposes of putting it on my website, and I pay you a fee for the use of the information.   I may do nothing with the information other than put it on the site, but a third party could scrape the data and reproduce it elsewhere.   The best way for you to deal with that problem is to require me to take technical and non-technical measures to deal with that issue.

Without these sorts of provisions in your content-sharing agreements, your content may end up in the public domain much faster than you intended.   This will reduce potential licensing income significantly, so tread carefully.

1 comment:

patrick_herrera said...

Great to see a local legal firm discussing these issues.
Companies like Google have helped fuel the concept that 'information wants to be free', and this has frequently unlocked a lot of value in underutilised assets on the web, particularly when combining different sources together. However the concept of ownership becomes increasingly blurred as this new resource moves from the producers to the consumers. I think it is fair to say that owning raw data is no longer enough, and unless you can protect it absolutely (good luck with that!), the money is frequently going to be made by those who can package it into something useful and accessible to a wider audience.

Often the fight is against your own customers who want to do things in ways you never intended, or have no control over. The battles over 'fair use' when it comes to copyright material are becoming more and more diverse and the legislation is struggling to keep up.

My company, Future Medium, does a lot of work in the area of sharing data, and most of the exciting possibilities we see every day involve seeing an interesting dataset and thinking of ways to expose it via a user-friendly application, or combine it with something else with an often exponential increase a value.

It seems like the people in the middle who manage to license the raw data in such a way that it can be freely made available to others are the ones getting all the attention – just witness Google Maps as practically a household name.
Hopefully people from all parts of the equation can see this as an example of what can be achieved, and try to identify other data sources that can be used to the benefit of all parties without waiting for Google to come along yet again and do it for us…