BarCamp Cambridge - teacking computers to understand text, Peter Corbett
Mon Jul 14, 2008
a desk at the computer lab and at the chemistry lab.
computationl lingustic chemistryauto-detect language in chemistry papers to try to recognics chemical andmarkup.
suppliment the mark-up from publishers.
can draw the chemical and annotating them overlayed over the paper
some problems are that there can be new names in papers,comapct names, include extra hyphens, this program can deal with these kindsof things.
also can use systematics parsing.
this is the core technology, you can do things like search for alkloids inyour paper, or document dump
this seems to run within a browser.
run the software over a corpus of about 100 papers, and created a searchengine out of this?? I Might be wrong about that.
can create an svg
can go from plain text to something like a connection layout using aninformation rich markup
the RSC is using this software along with human-clanup to create markup ofchemistry papers.
can then to semantic search over papers.
Small natual languge processing trickimage we were interested in opiates,we could just ask opiates to googleyou can ask a question like "opiates such as" will give you a much betterreturn on results.
I just checkd this ad it works
there are many patterns like this, they are known as hurst patterns.
he did a pass over abstracts on pubmed for these kind of patterns to make anetwork of relationships
there is not a connected graph
dot failes on large graphs, but the demo does show that you can automate thediscovery of reaction networks.
you can do reasoning on structure as well as process (now he mentions lot'sof chemical names that I know nothing about)
a few bits of wisom from this
most of the informaion has come from biochemists rather than chemists,more biologists are into open science, and open databasechemisty has ben mostly captured by commercial interest,hard to get free chemistry data.
next is to define what you are looking for?you want to be able to evaluate how well the software has donehow do you post-annotate the documents?in a lot of text there is a diffeernce between what you think the worldlooks like andhow it is described in the literature, so even when you get people to ..
question about confidence levels,the most recent piece of the software has confidence levels. rare eventsdon't providegood confidence levels
it could depend on what you are looking for for,
Peter thinkgs that confidence is important for these systems
e.g. "a such has b" if b might be a chemical but you are not sure. if lateryou find in your search that a is indeed a chemical it raises yourconfidence that b is indeed a chemical
Q: is there any way to automate the acronyms of chemicals.turns out that this is not allways nice. you can do some of this.