Some thoughts on FORCE2015, science trust and ethics.
Mon Jan 19, 2015
Last week I was at the FORCE2015 conference. I enjoyed it greatly. This was the 2015 instance of the FORCE11/Beyond the pdf conference. I’d been aware of these meetings since the first one was announced back in 2011, but this was my first chance to attend one. (If I recall, I’d even been invited to the DagStuhl workshop, but had been unable to attend. I’d been to one DagStuhl workshop on science and social networks many years ago, and that had been one of the best short meetings that I’d ever attended, so I’d been sad not to be able to go to the Beyond the PDF meeting).
This meeting was one of those where I felt totally surrounded by my tribe, I’ve been despairing at the inability of publishers to do little better than produce PDFs for research communication. I’m constantly grading our efforts as must do better. This was a meeting mostly filled with people who are interested in making things better (by which I mean the creation of tools and systems that help researchers, rather than pin researchers into metrics traps).
I read most of the 70 or so posters during the poster session. When I got home my wife said “nobody reads all the posters”. Some of the posters were great, and some were awful, but I read through most of them. I’ve put up a gallery of posters on my flickr account.
So, here are some relatively unconnected thoughts on the conference.
Riffyn looks like it might actually be useful. I’d been hearing about this product for about half a year now, and there is a small article in Nature that very much does not describe it, but up until this meeting it looked very much like vapourware, with no concrete explanation of how it might achieve what the company says it will. After chatting to and seeing his short presentation In my mind there are two components to the value offering. The first is a way to stream readout data from lab equipment to a central data collection service. The second is a software suite that allows you to modularise and encapsulate the components and variables of the experiment under consideration. At eLife we have been looking a lot at infrastructure for log analysis, and tools that can operate on streaming data, and I was reminded that the big data industry (sysops, devops), have been churning out tools to do a lot of this kind of thing for a few years now, so again it might be an example of where the research bench could learn from the software industry. I would imagine that a product like this will be of more assistance at the point where one wants to move from exploratory research to translation, and that initially the kinds of labs who will be setup to be able to take advantage of a system like this are likely to be the more organised ones, who probably don’t need something like this. It also reminds me of the experiments that Jean Claude Bradley was doing with his open notebooks science back in the mid 2000’s.
I greatly enjoyed the session on reproducibility. The speakers were mostly in agreement, it was mentioned in the session that some of the nuances over the difference between reliability and reproducibility were just semantic, of course the things about semantics is that it is a reflection of how we use language, and I found it valuable to be presented different vantage points from which to look at the topic. In a nutshell, when we say we want reproducible science, what we really want is for the claims about the world that are made to be reliable ones. We want an operational definition of truth (on the topic of truth there is an outstanding episode of In Our Time on the topic). Bernard Silverman described it in the following way - a claim or a paper is a virtual witness to the act of a person finding something out about the world. Witnessing requires trust, and trust requires the existence of a network on which to build that trust, and that network needs an ethical framework within which to operate. Science has an ethical framework (it seems to me that at the heart of this ethical framework are that we don’t lie about reality, and we grant recognition and dignity to others and to the natural world). In this context the reliability of results are a necessary, but not sufficient condition for ethical work.
As an aside how this ethical framework has emerged in the sciences is fascinating, for every behaviour that we idealised it is easy to look back at distant and recent history to see successful practitioners and results that contravene our ideas of ethical science. The first surgeon to succeed - Christiaan Barnard - experimented in a totally unethical way. Mendeleev’s results were not trustworthy, Newton did not openly communicate his work.
As a further aside it also seems to me that online networks that get created by commercial entities operate within their own micr-ethical framework, and as long as we passively participate in them with no say in their governance or mores of operation the likelihood is that they will have a strong tendency to overstep our ideas of ethical behaviour.
There was some dissent amongst the speakers (well from one speaker), on the need for reproducibility. He did say that he didn’t think he was going to have a very receptive audience for his views, and that’s certainly the case for this member of the audience. I think he was complaining about the idea of the need for reproducibility, basing this on the claim that we have never had reproducibility in science, whereas if we understand that what we want is reliability, and if we recognise that we are concerned that there may be areas of research that are highly unreliable, then his objection falls apart. I think it is important to ground our calls for reliability in science to the instances where we fear there may issues. That covers a number of behaviours - making results available on request, providing all of the steps in the paper that are required to replicate an experiment, not making shit up. Things like the reproducibility project cancer biology can help to provide a survey overview of a field and give some evidence on what the nature of challenges around reliability we may face in a specific discipline. One of the people working on this project mentioned to me that at least one of the papers that they wanted to do a reproduction study on was impossible because the lab involved refused to share critical material with them. Many of the other papers that they are looking at are impossible to replicate just from the descriptions given in the published papers (most of these labs are helping, but perhaps it exposes the limits of the research paper as a format for communicating extremely technical and complex information).
I believe that I understood that this speaker mentioned that reviewers being asked to understand the details of the paper to the extent of being able to determine whether these papers could be reproduced or not would add too much burden on the reviewers, leading to reviews of much lower quality. I guess that depends on the nature of the review being asked for - whether it’s asking to check for rigour, novelty or elife-ness, but I would hope that one would not drop the requirement for rigour just in the search for novelty, and I would hope that reviewers would keep an eye open to ask whether the claims made in the paper can be supported by the evidence ad the tools used to gather that evidence.
On the topic of statistics this leads to an interesting question. At this point it’s well documented that many papers use incorrect statistics, or statistics or low power. It might well be that a reviewer may not have the time to run these numbers, it would be great if we could automated these kinds of checks. A precursor for that is making the underlying data available.
When it comes to making the underlying data available, someone from the audience raised the question of what are we to do with results that come from resources that have un-recreahable data sets (like google and Facebook). I think the inference was that such places hide their data and don’t make it available, and yet do a lot of research, hence could we trust the results that come from these places. The panel had a good comment on this. They extended the example to that of CERN, a facility that it will be impossible to ever replicate. Many of the experiments that happen in CERN can never be replicated, but the people in CERN operate as if they could. By putting in place the working methods of making their work in-principle possible to recreate, they produce better science. As one of the panelists said - if an experiment is truly reproducible, then you would never need to reproduce it (which comes back down to the issue of trust and reliability). Indeed it would be unethical to reproduce certain classes of research, such as widespread epidemiological or clinical trials. Once you have a cure, it would be unethical to withhold it for the sake of replicating the experiment. I think that the cases of big data - such as at Google and Facebook, have at least two further motivations to keep them honest - to a certain degree. The first of these is the profit motive. They are not in-principle doing experiments with their data for the sake of it, they are attempting to gain market share and to devise produces that people will want to use. The success of their experiments is based on whether what they produced is used or not, not by whether they get a publication. Another powerful force at work in these organisations is the need to share code and resources, and effectively most of their engineers are commoditised, and are replaceable. A new engineer coming into a team has to be able to get spun up quickly, and needs to be able to run the analysis or deploy the code towards the shared good. In that light they use significant amounts of code review, they code to common standards, and they encapsulate their work in a way that makes it easy to scale horizontally. They keep the configuration of their systems in code, in a self documenting way, and, at least within the organisation, have broad openness about how they are doing what they are doing, which in a pleasing turn or thought brings us back to Riffyn, which is hoping to provide tools to allow that type of commoditisation to enter the lab.
Onto another track now, I’d like to recount a small conversation that I had with one of the people presenting a poster. The idea looked interesting, but challenging. I started to ask the person about how their idea would be useful to people, and I was told all about the potential benefits. I’m always interested in how an idea is going to get market share, because that’s often the hardest thing, especially in a marketplace that values conservative behaviour. At this point it became clear that nothing had been built yet, so I asked if this person had done any user testing on the basic idea, or had any plans to. I was told that there were no plans to do any user testing, and that in face getting to a working prototype was probably out of scope, but modelling the semantic relationships of the idea, and doing some computer science work on this side were probably more than enough work for the PhD that the person was working on. Hmmmm, I think this is a weakness of the more academic side of this field, I want to see things emerge that are useful for the research enterprise, I want to see us build on top of the great corpus of open content that is now emerging.
Phil Bourne mentioned this in his closing remarks at the conference, that there is still a way to go before open access can really prove it’s worth. Indeed some of the more interesting technology demonstrations I saw were working on top of closed source software, analysing closed corpora of content provided by the big publishers. My friends, we need to build, together we need to build that future that we want to see emerge on top of the great potential of open content. You don’t need permission, just go and start now.