Friday, 4 July 2008

best search dataset ever?

In a current lawsuit between Google's YouTube and Viacom, Google have been instructed to hand over their logs of user viewing habits. Google are upset about it, but at least the judge overruled the request for Google to hand over their source code for filtering copyrighted material! Google have requested, although its not been confirmed yet I believe, that they get to anonymise the logs first, to respect the users' privacy. i personally hope this is allowed.

Anyway, despite this interesting privacy issue, i can't help but thinking that 12 terabytes of usage logs from youtube would be an AMAZING research resource for investigating user behaviour. Sadly they dont have facets to chat about, but it could tell us how people have used query refinements, spelling corrections, categories, filters, similar clips, recommended clips and so much more!

Google might as well do something useful with it, if the data is going to shown to at least one third party.

