Daily Archives: June 20, 2007

When Web-scraping Doesn’t Cut It

When you search for my name on the “professional search directory” ZoomInfo, you get some very interesting results. My favorite:

Very Subtle Captain

I’m pretty sure their web crawling/scraping algorithm needs a little tweaking. Somehow they also connected an article about New Hampshire history (circa 1770-1790) which also mentioned a “Captain Chase Taylor”. Again, they might want to refine the sources (and the methods) that they collect their data from.

It’s a monumental task to try creating useful profiles of people from content scattered across the web. Spock is at least one startup that thinks they can do a better job, and they’re going for an even larger data set than what ZoomInfo currently boasts. Scouring information in a controlled way from the right places, I think it may achieve some decent results. Afterall, the current bar is set at combining Revolutionary War era military records with soccer goalie quotes from 2005, so the sky’s the limit.