Posted on: 06 September 2010 in yql, science, scripting, hacking, cool, solo10
Last Saturday at Science Online London I gave a quick tutorial on YQL, and how it might be used to mash up scientific data sets. Below I list some of the sample queries that I was playing with. Before you get started with the console have a look through the documentation. I got a lot of milage out of the part about filters and joins. The blog post by Paul Hogan on using YQL for library maships was also very helpful.
In my presentation I was originally looking at extracing data from a report on soy bean rust spread.
Here are a few sample queries to get you started
1:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
This query just pulls all of the HML from the page. Open in console.
2:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
and xpath='//table'
This query extracts only the table element from the page. Open in console.
3:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
and xpath='//table/tr/td[2]'
This query pulls the second item from each row of the table. Open in console.
4:
select * from html where
url="http://www.mulvany.net/files/ipmsemanticpipe.html"
and xpath='//table/tr/td[@id="status" and p="Confirmed"]/..'
For this query I copied the table onto my own server and added some basic proto-semantic markup to the column descriptors. I could then call out specific columns from the table. Open in console.
5:
select * from csv WHERE
url="http://www.mulvany.net/files/ipmpipe.csv"
and columns='date,place,status'
and status='Confirmed'
With this query I converted the table into a csv file. This demonstrates YQL’s ability to query against csv files. Open in console.
Comments
If you would like to leave a comment then email me at ian@mulvany.net, and if I like it I'll add it to the post.