science scraping with YQL
Mon Sep 6, 2010
255 Words
Last Saturday at Science Online London I gave a quick tutorial on YQL, and how it might be used to mash up scientific data sets. Below I list some of the sample queries that I was playing with. Before you get started with the console have a look through the documentation. I got a lot of milage out of the part about filters and joins. The blog post by Paul Hogan on using YQL for library maships was also very helpful.
In my presentation I was originally looking at extracing data from a report on soy bean rust spread.
Here are a few sample queries to get you started
1:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
2:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
and xpath='//table'
3:
select * from html where
url="http://sbr.ipmpipe.org/cgi-bin/sbr/county_info.cgi?date=2010-07-13&pest=soybean_rust&host=All%20Legumes/Kudzu"
and xpath='//table/tr/td[2]'
4:
select * from html where
url="http://www.mulvany.net/files/ipmsemanticpipe.html"
and xpath='//table/tr/td[@id="status" and p="Confirmed"]/..'
5:
select * from csv WHERE
url="http://www.mulvany.net/files/ipmpipe.csv"
and columns='date,place,status'
and status='Confirmed'