AOL’s Big Goof

AOL’s research department released a dataset containing the search history of 500 thousand users with 20 million search terms. They released it for: “The goal of this collection is to provide real query log data that is based on real users. It could be used for personalization, query reformulation or other types of search research.” AOL soon removed the data from their site, but the damage has already been done and mirrors are all over the net.

This data is very valuable for marketers, SEOs, and spammers. The problem with the data is that it identifies users with a unique id. Thus all searched from a particular user are related with this id. And with enough searches it can be possible to determine who the person is. Since AOL uses google as its search engine, this is essentually the same data that Google fought the goverment to keep it from them. Now it is all over the net. And people are finding all kinds of interesting info.

It is only a matter of time until someone releases a web interface to search and parse this data. I am sure google link spammers are already parsing this data to find the best keywords to spam. I would imagine that google will have an interesting response soon. And this is definitly going to hurt aol. I am glad that I am not using aol for anything other than aim, but it would not suprise me if I found my chats online soon.

Of course I have already downloaded the data, and though I don’t have much time with moving in two weeks, I will probably import the data in MYSQL and do a few queries. >:)

0 Responses to “AOL’s Big Goof”


  1. No Comments

Leave a Reply