science scraping with YQL06 Sep 2010 in yql, science, scripting, hacking, cool, solo10
Last Saturday at Science Online London I gave a quick tutorial on YQL, and how it might be used to mash up scientific data sets. Below I list some of the sample queries that I was playing with. Before you get started with the console have a look through the documentation. I got a lot of milage out of the part about filters and joins. The blog post by Paul Hogan on using YQL for library maships was also very helpful.
In my presentation I was originally looking at extracing data from a report on soy bean rust spread.
Here are a few sample queries to get you started
This query just pulls all of the HML from the page. Open in console.
This query extracts only the table element from the page. Open in console.
This query pulls the second item from each row of the table. Open in console.
For this query I copied the table onto my own server and added some basic proto-semantic markup to the column descriptors. I could then call out specific columns from the table. Open in console.
With this query I converted the table into a csv file. This demonstrates YQL’s ability to query against csv files. Open in console.