Google Dance Esmerelda

Every month, Google takes the results of the previous month's deep crawl, and pushes the results up to the Google servers. This results in some weird and unpredictable search results as each of the thousands of Google servers receive the updates. This is called the "Google Dance." At WebmasterWorld this month, Google Dances were given names, like Hurricanes. As we write this, Esmerelda is hitting the Google servers. The recent updates have been strange, though. This points to some major changes in the way Google works. In the past, Google has operated two sets of web crawlers: the "fresh" crawl and the "deep" crawl. The deep crawl is exhaustive, and runs once a month. The fresh crawl is superficial, and runs continuously. Recently, webmasters have noticed that the deep crawler has disappeared, and instead the fresh crawler is behaving like the deep crawler. The conclusion is that Google is moving towards a more continuous update process. Some speculate that Google's responding to the prospect of a Microsoft search engine as well as increased competition from the existing search services. Our pet theory is that PigeonRank is finally being implemented. A more thorough explaination can be found at Kuro5hin.

Web Diarists, Collaborative Filtering, and Scale-Free Networks

Even though it disparages Josh Marshall, we have to thank No Data Source for the new Hugh Hewitt piece on the Big Four web logs. There was a time when web-based journalism was supposed to somehow revolutionize the delivery of news. The combination of low overhead and accessibility that web sites provide was supposed to wrest control of news from corporations and put it in the hands of the people. Now, presumably, anyone can publish their own broadsheet. It's unavoidable that readership is going to gravitate towards a small group of news providers -- no one person can read everything. The decision of which news sources to read is influenced in large part by their visiblity and referrals from friends -- it's a textbook scale-free network, where things that are popular tend to stay popular, and the ignored stay ignored. The result is Hewitt's Big Four: Instapundit, Mickey Kaus, Andrew Sullivan and the Volokh Conspiracy. Together, these four news outlets exert an enormous amount of influence over the day's agenda, reducing most publishers (like ourselves) to echoes and rehashings of thier posts. This is natural, of course -- reputation and habit are an essential part of the intellectual economy. It's also functionally identical to the "corporate media" problem: the agenda's controlled by a handful. It's useful to look at how computer scientists deal with this "collaborative filtering" problem. After a time, ranking items by strict popularity becomes less useful. The homogenization of search results are going to prevent valuable but unknown items from being found. The simplest solution is to insert unpopular items, at random. This doesn't interfere too much with the accuracy of the results, but does give a fighting chance to the underdogs. For you, the news consumer, this means occasionally trying something new. Just visit WebLogs or another blog aggregate service, and see if you can't find a new favorite. Unfortunately, scale-free networks tend to discourage this behavior. You need a large number of people accidentally picking up the same underdog at the same time in order to gather enough momentum to bring it to the top. Epidemiology studies scale-free networks, too. Viruses get passed around by a core group, and infect populations in clusters. So, it seems, truth is a virus.

SpamArrest Smackdown

The premise is this: you receive an email of unknown origin, and SpamArrest will bounce the message to sender, asking them to confirm their humanity before it forwards the in-doubt email to your inbox. A lovely idea, until SpamArrest sends spam. Their license allows them to use any email address they receive, from any source, for their own purposes. More from PoliTech.