PLOS are looking for a new CEO

So I hear that PLOS are looking for a new CEO. They are making the process fairly open, so if you are interested you can read more here.

I got to thinking about some of the challenges and opportunities facing PLOS over the weekend. Over the years I’ve gotten to know a lot of PLOS folk, and I think it’s an amazing organisation. It has proved the viability of open access, and their business model is being copied by a lot of other publishers. At the same time they have had a fairly high frequency of turn over of senior staff in the last couple of years. So what are the likely challenges that a new CEO will face, and what should they do about them? (Time for some armchair CEO’ing).

The condensed view of PLOS’s mission that they want to to accelerate progress in science and medicine. At the heart of their mission is the belief that knowledge is a pubic good, and leading on from that, that the means for transmitting that knowledge should also be a public good (specifically research papers).

It was founded in 2001 by three visionaries, and it was configured to be a transformational organisation that could catalyse radical change in the way that knowledge is created and disseminated, initially in particular in contrast to the subscription model for distributing scholarly content.

Since launching PLOS has found massive success with the introduction of PLOS one, currently the largest journal in the world. That rapid growth led to a period of significant scaling and adjustment for the organisation, where it had to keep running at full pace in order to stay just about on top of the flood of manuscripts that were coming its way. This also created a big revenue driver for the organisation that has led to PLOS one being the engine that drives the rest of the PLOS.

So now we have the strategic crux facing any incoming CEO. The organisation has an obligation to be radical in it’s approach to further it’s mission, but at the same time the engine that drives the organisation operates as such scale that changes to the way it works introduce systemic risks to the whole organisation. You also have to factor in that the basic business model of PLOS one is non defensible, and market share is being eroded by new entrants, in particular Nature Communications, so it is likely that no changes also represents a risky strategy.

So what to do?

There are probably many routes to take, and there are certainly a large number of ongoing activities that PLOS is engaged in as part of the natural practice of any organisation. I think the following perspectives might have some bearing on where to go. As with any advice, it’s much easier to throw ideas across the wall when you don’t have any responsibility for them, but I’m going to do it anyway in the full awareness that much of what I say below might not actually be useful at all.

Changing PLOS does not change scientists

PLOS has shown that Open Access can succeed, and it’s existence has been critical to confirm the desire of researchers who want to research conducted as an open enterprise. That has allowed those researchers to advocate for something real, rather than something imagined. However, there remain a large number of researchers for whom the constraints of the rewards system they operate under outweigh any interest they may have in open science. I think it is important to recognise that no matter what changes PLOS introduces, those changes on their own will not be sufficient to change the behaviour of all (or even of a majority) of researchers. Being able to show plausible alternatives to the existing system is important, but it is also important to continue to work closely with other key actors in the ecosystem to try to advance systemic change. What that tells me is that the bets that PLOS ought to take on to create change do have to be weighed against their likelihood to affect all researchers, and the risks they introduce to the current business model of PLOS.

On the other hand you do want to progressively make it possible for people to be more open in how they conduct science. We talked a lot at eLife about supporting good behaviours, and you could imagine using pricing or speed mechanisms as a way of driving that change (e.g. lower costs for publishing articles that have been placed on a preprint server, for instance). One does have to be careful with pricing in academic circles as usually costs to publication are rarely a factor in the decision of an academic around where to publish, but generally I’m in favour of providing potentially different routes through a product to different users, and making the routes that promote the behaviours I support be easier/cheaper. (Github do this brilliantly by make open code repositories free to host, and only making you pay if you want to keep your code private).

How do you balance risk?

One of the things that is consistent in innovation is that we mostly don’t know what is going to succeed. I expect that the success of PLOS one probably took PLOS by surprise. It was a small change to an existing process, but it had a dramatic effect on the organisation.

It seems to me that what you want to do is to have a fair number of bets in play. If we accept that we mostly won’t know what is going to succeed in the first place, then the key thing is to have a sufficient number of bets in place that you get coverage over the landscape of possibilities, and you iterate and iterate and iterate on the ones that start working well, and you have the resolve to close down the ones that are either making no progress or are getting stuck in local minima.

Product Horizons

I like the idea of creating a portfolio of product ideas around the three horizons principle. There are lots of ways of determining if your bets are paying off. One of the things that I think PLOS needs to do is to ensure that at least a certain minimum of it’s financial base is being directed towards this level of innovation.

I don’t think that is a problem at all for the organisation in terms of creating tools like ALM and their new submissions and peer review system, but I’m not clear on whether they have being doing this strategically across all of the bases where they want to have an impact. That’s not an easy thing to do, balancing ongoing work, new ideas, being disciplined to move on, being disciplined enough to keep going with the realisation that real success sometimes takes you by surprise.

PLOS may need diversification

As I referred to above, the business model of PLOS, as it’s currently configured, is not easily defensible. Many other publishers have created open access journals with publishing criteria based on solidity of the science rather than impact. The Nature branded version of this is now attracting a huge number of papers (one imagines driven by the brand overflow from the main Nature titles). This speaks to me that there is some value in looking at diversifying the revenue streams that PLOS generates. This could be around further services to authors, to funders or to other actors in the current scholarly ecosystem. Here are three ways to potentially look at the market.

One; what will the future flow of research papers look like, how does one capture an increasing share of that? Will increased efficiencies of time to publication, and improved services around the manuscript be sufficient, how might the peer review system be modified to make authors happier.

Two; ask how will funding flow to support data and code publishing, will there be funding for creating new systems for assessment? Can any services that benefit PLOS be extended to benefit others in the same way?

Three; if you are creating platforms and systems that can be flexible and support the existing scale of PLOS, what might the marginal investment be to extend those platforms so that others could use them (societies, small groups of academics that want to self-publish, national bodies or organisations from emerging research markets).

The key here is not to suggest that PLOS has to change for it’s own sake, but rather to be clear about exploring these kinds of options strategically. It might be that you can create streams of revenue that make innovation be self-supporting, it might be that you hit on a way to upend the APC model. These efforts could be seen as investment in case the existing driver of revenue continues to come under increasing pressure in the future.

Ultimately you want to build a sustainable engine for innovation.

Who does all of the work?

In the end all of the work is done by real people, and the key thing any new CEO is going to have to do is to bring a clarity of purpose, and to support the staff who are in the thick of things. What I’ve seen cause the most dissatisfaction in staff (aside from micromanagement - a plague on the houses of all micro-mangers), is a lack of ability to ship. This usually comes down to one of two causes, either priorities chance too quickly, or unrealistic deadlines are set that lead to the introduction of technical debt, that causes delays in shipping. It’s key to try to identify bottlenecks in the organisation, and (as contradictory as it might sound) to try to create slack in people’s schedules to allow for true creative work to happen.

If everyone is going open access why should PLOS exist, has it now succeeded in some way?

Given that almost all new journal launches are now open access journal launches, has PLOS effectively won? Could the existing PLOS as it exists essentially go away? I think within one area of how we get to an open research ecosystem that might actually be true, however that only speaks to access to the published literature. Open science requires so much more than that. It needs transparency around review, efficiency in getting results into the hands of those who need them, data and code that are actionable and reusable, a funding system that abandons it’s search for the chimera of impact, an authoring system that is immediately interoperable with how we read on the web today.

So what to do with PLOS as it’s currently configured? I see the current PLOS, with it’s success, as being an opportunity to generate the revenues to continue to explore and innovate in these other areas, but I think that the current system should be protected to ensure that this is possible.

In the end of the day, what does a CEO do?

I can’t remember where I read it now, but one post from a few years back struck me as quite insightful. It said that a CEO has three jobs:

  • make sure the lights stay on
  • set the vision for the organisation
  • ensure that the best people are being hired, and supported

PLOS is in a great position at the moment. It has a business model that is working right now, and is operating at a scale that gives any incoming CEO a good bit of room to work with. It’s a truly vision led organisation, whose ultimate goal is one that can benefit all of society. It has great great people working for it.

I don’t think that the job is in anyway going to be a gimme, but it’s got to be one of the most interesting challenges out there in the publishing / open science landscape at the moment.

Reverse DOI lookups with Crossref

Today I had a need to think about how to do a reverse lookup of a formatted citation to find a DOI.

@CrossrefOrg helped out and pointed me to the reverse api endpoint. It workes like this:

http://api.crossref.org/reverse

Created a json payload file “citation.json” formatted as follows:

[
  "	Curtis, J. R., Wenrich, M. D., Carline, J. D., Shannon, S. E., Ambrozy, D. M., & Ramsey, P. G. (2001). Understanding physicians’ skills at providing end-of-life care: Perspectives of patients, families, and health care workers. Journal of General Internal Medicine, 16, 41-49.
  "
]

Call the API using CURL (you need to set the Content-Type header to application/json)

$ curl -vX POST http://api.crossref.org/reverse -d @citation.json –header “Content-Type: application/json”

I then got the following response:

{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2016,10,25]],"date-time":"2016-10-25T11:17:12Z","timestamp":1477394232160},"reference-count":21,"publisher":"Springer Nature","issue":"1","content-domain":{"domain":[],"crossmark-restriction":false},"short-container-title":["J Gen Intern Med"],"cited-count":0,"published-print":{"date-parts":[[2001,1]]},"DOI":"10.1111\/j.1525-1497.2001.00333.x","type":"journal-article","created":{"date-parts":[[2004,6,9]],"date-time":"2004-06-09T16:44:02Z","timestamp":1086799442000},"page":"41-49","source":"CrossRef","title":["Understanding Physicians' Skills at Providing End-of-Life Care. Perspectives of Patients, Families, and Health Care Workers"],"prefix":"http:\/\/id.crossref.org\/prefix\/10.1007","volume":"16","author":[{"given":"J. Randall”,"family":"Curtis","affiliation":[]},{"given":"Marjorie D.","family":"Wenrich","affiliation":[]},{"given":"Jan D.","family":"Carline","affiliation":[]},{"given":"Sarah E.","family":"Shannon","affiliation":[]},{"given":"Donna M.","family":"Ambrozy","affiliation":[]},{"given":"Paul G.","family":"Ramsey","affiliation":[]}],"member":"http:\/\/id.crossref.org\/member\/297","container-title":["Journal of General Internal Medicine"],"original-title":[],"deposited":{"date-parts":[[2011,8,10]],"date-time":"2011-08-10T15:39:02Z","timestamp":1312990742000},"score":120.61636,"subtitle":[],"short-title":[],"issued":{"date-parts":[[2001,1]]},"alternative-id":["10.1111\/j.1525-1497.2001.00333.x"],"URL":"http:\/\/dx.doi.org\/10.1111\/j.1525-1497.2001.00333.x","ISSN":["0884-8734","1525-1497"],"citing-count":21,"subject":["Internal Medicine"]}}

From this we can see that crossref suggests the following DOI lookup with a score of “120” http:\/\/dx.doi.org\/10.1111\/j.1525-1497.2001.00333.x

There is some backslash escaping going on here, so the actual lookup url is: http://dx.doi.org\/10.1111/j.1525-1497.2001.00333.x.

This directs us the the following article, which does seem to be the one that we are interested in.

What do we mean when we talk about Big Data?

What do we mean when we talk about Big Data?

The following blog post about this article provides the following definition of big data:

“High volume data that frequently combines highly structured administrative data actively collected by public sector organisations with continuously and automatically collected structured and unstructured real-time data that are often passively created by public and private entities through their internet.”

The article is behind a paywall, but the blog is pretty clearly laid out. The authors seem mostly concerned about how the term big data is used by researchers who are mostly coming from a background of working with public sector data.

My takeaways from the blog post are:

* public sector use of the term *Big Data* is sometimes divergent from what the term means in the private sector 
* real time data collection *could* be a vice in the public sector 
* Digital exhaust data is only coincidentally aligned with having any utility for answering public policy questions, and given that is it at all suitable for such purposes? 
* The ethics of the use of this data are unclear 
* This kind of present Big Data is not representative of our full lives, nor representative of all citizens 
* There remains great potential, but excitement around this potential must be tempered with an understanding of the current inherent limitations of this resource. 

I think these are all reasonable positions to take at the moment, however the definition of big data leaves open how we might interpret what high volume means.

A position I’m coming to about big data is that it’s mostly around how comfortable you feel with the data, and that one person’s big data is another’s batch job. What the explosion of data has created is an increase in the number of occasions where a particular researcher will hit against the limits of what is technically possible to them, at that moment in time. Setting aside all of the questions about what is in the underlying data, and how well it may or may not be a good fit for the research question being asked, what I find very exciting is that the journey of gaining the capacity to work with the data that you think is big today is one which will create a cohort of researchers who are unafraid to also deal with what may be big for them tomorrow. In this way we create an environment of fantastically skilled researchers, who are potentially in a better position to tackle hard problems than they are today.

Hello SAGE!

Hello SAGE!

I joined SAGE at the start of September. Hello SAGE!! Here I outline some of my initial impressions.

First up, I’ve been really delighted to meet so many great people at SAGE. I’ve received great support from everyone in the company. I generally find publishing folk to be very friendly. This is a friendly industry, working on the fabric of knowledge, knowing that your work can help to make a difference, trying to make the work of academics a bit easier. I believe that these are all things that can help to create a good environment for an industry to be situated in. All that aside, I’ve still been really impressed by how lovely everyone is. I think that comes from some initial interactions that I had way back in my first week, and it’s only continued through the weeks.

One obvious change at SAGE is the scale of the company. It’s a good bit bigger than eLife, and I’ve not worked in a company close to this size since 2010. At Mendeley, and later at eLife, I saw what happens as a company starts to grow out beyond the point where not everyone is able to know everything that is going on (that’s not a bad thing at all, just an inevitable part of the development of a company). Back when I was working in Springer and Nature my work mostly involved interacting with people in close proximity to my project. What I’m working on now is of interest across the company. Communicating across the natural silos of information that will emerge in a large organisation such as SAGE has required some new thinking. The main thing to note is the existence of structure that is contingent on the history of how that structure emerged, and the best thing I’ve found for understanding that quickly is just to tap into the collective wisdom that already exists within the organisation. Basically asking people who have been around for a lot longer than I have about how to do things, or who to talk to. That’s mostly been successful. The one time where it didn’t work so well was when asked someone a few things, only to discover pretty quickly that they only knew fractionally more than I did because they had only been here about a week longer than me!

I’d never known a huge amount about SAGE before starting to think seriously about coming on board. I’d known a few people for a few years, whom I held in fairly high regards. I didn’t know that the name SAGE comes from Sara and George, the founders of the company. Sara is still very much involved in the company, and chairs the board meetings, as well as continuing to take a keen interest the strategic direction of the company. Since joining I’ve had the pleasure of meeting her a couple of times, and I’ve been hugely impressed with how impassioned she is for the important role that social research can play in society. One moment in particular stands out. It was a few weeks ago, just a few days after the US election. Moods were a little deflated. She stood up at a small meeting and simply articulated the importance of what social science researchers are doing for societal outcomes. She talked about how organisations like SAGE are in a privileged position, and being in that position almost sets a demand on them to do what they can to help support that role of social science.

I think this connects well with another thing that I’ve learnt in the last few months. For the particular project that I’m working on I’m spending a lot of time talking directly to researchers. They uniformly have a positive attitude to SAGE, and the things that SAGE builds in this space. It’s clear that the values of company really seep into how they act in the market.

So what is it that I’m doing now? My job title here at SAGE is Head of Product Innovation. For the time being that title sits in front of one very specific project. My main responsibility over the next year is to support the emerging field of what we might loosely call computational social science. Specifically the team I am in are working on finding services that SAGE can partner on, or build. It’s a pure greenfield product development position.

Here I’m not going to get into the nitty gritty of what to call a data intensive way of doing social science (there are subtleties around whether we call it A or B, or some other label), but I’ll tell you what we currently believe, and what we have observed.

We believe that data at scale is transforming many aspects of how social science is done, and with that transformation is will come the opportunity to answer questions that were previously intractable, as well as making it easier to tackle currently hard questions. We believe we are seeing the emergence of a new methodology for how social science can be done (but we also believe that this does not remove the need for existing methodologies, rather it will enhance them). My favourite analogy here is to think of this as akin to the creation of a new kind of telescope or instrument. It opens up new ways of viewing and understanding the world that builds upon and broadens what we already know.

We see some groups out there already doing this kind of work, and we see many others who are interested but who face a variety of barriers to starting with these techniques. This is where the project I am working on comes in. Initially we are trying to understand these barriers, and design things that can help reduce them.

There are many many reasons why this move towards data intensive social science may be important. At a very basic level expanding the tools available to scholars is always a good thing. Being able to make the most of the implicit data that is now a by-product of the digital interfaces of our lives may move us from a position where we may be haunted by that data to a position where we have have the means to understand how to deal with it. I feel that most importantly it’s also about bringing some humanity to the systems that we are building today. These digital systems and the data that we as a society, and as individuals, are generating are determinative to many social outcomes. If the only driver for the creation of these systems is the market then those outcomes are probably not going to be wholly fantastic. (Cathy O’Neil writes about this clearly in Weapons of Math Destruction). To help balance this I think social scientists need a seat at the table when it comes to the design and engineering of those systems.

Over its fifty year history the publication of methods has been core to what SAGE does. From this perspective finding a way for SAGE to support the emergence of a new class of methodologies makes perfect sense for us. We are not working in isolation either, rather, we are contributing to a strong trend in a way of thinking about the systems that surround us. We want to help by being an active partner in initiatives that can help with the agenda I’ve outlined above, and where we have the opportunity to build things that help move that forward, we will try to do so.

Our thinking about the kinds of things that we can help to create is still very open ended at this point in time. It is also almost impossible to predict what are the things that you do that will have a real impact, and what are the things that you do that end up not making much difference. What is clear is that you don’t have a chance of finding out if you don’t try. We aim to try, and to learn, and hopefully we can iterate on what we learn to find a way to make a meaningful contribution.

I’ve been asked by a lot of people why I decided to move from eLife to SAGE. I’ve already outlined here a bit about the project that I’m working on. Overall when I was approached with this opportunity I decided to weigh up three factors, what impact might the project have, what impact might I have on making the project a success, and overall how would working on this help me support my family.

It was clear to me pretty quickly that this project has the potential to be impactful, and certainly that the motivations behind the instigation of the project were very aligned with my own personal beliefs and interests. I also felt that my background in working on digital tools for researchers over the last few years was a good fit for the needs of the project. This past summer my family grew by one, and now with two young children to juggle (sometimes very literally), the opportunity to work on stuff that matters, and to do so from quite close to home, was one that I had to look very closely at. I made the jump, and now three months in I can honestly say I’m just getting more and more excited by where we are going with the project.

If you have managed to get this far and you are still interested in what we are working on, then maybe you might like thinking about joining us? We currently have a very small team. We decided from the outset to pursue a lean approach to product discovery and development. The team is myself, half of the amazing Katie Metzler and some time from the also awesome Martha Sedgwick.

After a lot of learning in the last three months we have decided that we want to bring in another person full time (on a 12 month contract) to help with the development of the project. We are initially setting the position to be a 12 month fixed contract, as we just have a high degree of uncertainty around what the ideal shape of the team will be in a year from now.

Initially we we want help with the following kinds of things (at the heart of which is helping us to understand the needs of researchers, and helping us to follow up on the many amazing conversations and leads that we are having right now), but we also have a fair expectation that the role, and the entire project, will continue to evolve over the coming year:

* Assist with market segmentation and market sizing
• Conduct competitor analysis product positioning
• Recruit a pool of relevant users for testing of product concepts with
• Conduct solution interviews
• Participate in usability and product concept testing
* Participate in product ideation workshops
* Synthesise and capture feedback from interactions with researchers, and share and distribute that feedback amongst other team members
* Provide product development support during the build phase from concept to MVP
* Be a voice for the user through the evolution of our product ideas.

If you are interested and want to find out more please reach out to me!

What we mean when we talk about preprints

Cameron Neylon, Damian Pattinson, Geoffrey Bilder, and Jennifer Lin have just posted a cracker of a preprint onto biorxiv.

On the origin of nonequivalent states: how we can talk about preprints

Increasingly, preprints are at the center of conversations across the research ecosystem. But disagreements remain about the role they play. Do they “count” for research assessment? Is it ok to post preprints in more than one place? In this paper, we argue that these discussions often conflate two separate issues, the history of the manuscript and the status granted it by different communities. In this paper, we propose a new model that distinguishes the characteristics of the object, its “state”, from the subjective “standing” granted to it by different communities. This provides a way to discuss the difference in practices between communities, which will deliver more productive conversations and facilitate negotiation on how to collectively improve the process of scholarly communications not only for preprints but other forms of scholarly contributions.

The opening paragraphs are a treat to read, and provide a simple illustration of a complex issue. They offer a model of state and standing, that provides a clean way of talking about what we mean when we talk about preprints.

There are a couple of illustrations in the paper of how this model applies to different fields, in particular, physics, biology, and economics.

I think it would be wonderful to extend this work to look at transitions in the state/standing model within disciplines over time. I suspect that we are in the middle of a transition in biology at the moment.

Textometrica, a tool review

A quick spin with Textometrica

Leviathan Network Image

Yesterday I had a good conversation with Simon Lindgren, the creator of textometrica. I decided to try out the tool before chatting to him.

Textometrica encapsulates a process for understanding the relationship and distribution of the occurrence of concepts in a body of plain text. It provides a multi-step online tool for the analysis.

The advantage of using this tool is that you don’t need to be able to do any coding to get to a point where you have some quite interesting analysis of your corpus. One potential downside is that the tool is strongly focussed on the specific workflow that Simon devised. When I talked to him later about this it was clear that he built the tool to scratch a specific itch.

In order to try the tool I needed a corpus to work with. I got a copy of Hobbs’s Leviathan from project Gutenberg, and in a plain text editor I removed the Gutenberg forward and footer.

I started by just trying to upload the file to textometrica, and it looked like I’d made the tool hang. At this point I started looking at the 10 minute video overview of the tool, and I discovered that I need to indicate a text block delimiter within the text. Using the editor I replaced all full stops with the pipe symbol | and re-uploaded, and made much more progress.

If you are interested in exploring the tool I highly recommend working through the video as you get started. The tool is not exactly self-documented, but the video gives a sufficient overview of how to use it.

In under a quarter of an hour I was able to generate a network graph of the largest co-occurring concepts in the Leviathan, and was able to create a public archive of the project.

Each step of the tool has a few custom options, and it seems to me that they were introduced as a result of Simon wanting to refine the process as he developed it. This does provide the ability to do some fine tweaking of your analysis, but at the same time the options are quite opinionated, so you would want the envisaged analysis to be quite close to what you want to do with the tool.

That said, I was able to accomplish a reasonably complex analysis on a reasonably sized corpus very quickly.

Social Humanities DataHack event

How do people represent themselves on social media, and how are they represented by others? Which qualities and virtues are emphasized (or ignored)? How polarised are these (re)presentations?

There is a workshop looking at this very question happening in Oxford in early January. The morning will be a series of workshops on tools for tackling a question like the above (I’m thinking of attending the Wikipedia and Topic Modelling workshop), and the afternoon will be a hackathon looking at some data sets.

It sounds pretty interesting, and a nice way to warm up to the new year. It’s being hosted by the The Oxford Research Centre in the Humanities, and you can signup on eventbrite.

Something broke in a Jekyll upgrade (a.k.a, sometimes I hate software)

This is a short story about software, some of the things I hate about it, my lack of knowledge of ruby, and a desire to own my own words.

For various reasons I’m working on a brand new machine, I decided that I want to start posting to my own blog again (as well as cross posting to Medium, because fuck it, why not).

That involved dusting down my Jekyll site and seeing if I could get it to work again.

It’s been a while, mind, so the very first thing that I did was go and pull my blog content down from Github and fire up Jekyll.

Jekyll has moved on since I last used it, and I discovered that the mechanism that I was using to create an index page of my tags no longer works. The following line in the rake-file that I was using is deprecated, and throws an error.

	site.read_posts('')

I thought, it’s one line, how hard can it be to fix? Of course, the dirty little secret is that I don’t know ruby, I’d just been using Jekyll in the past as a fast way to generate blog content from markdown. I spent several hours this afternoon trying to tack down a short comprehensible workaround, and have come to the conclusion that I won’t make progress without actually learning enough ruby to become proficient at writing ruby plugins, and my life is too short to do that.

Writing my markdown in Byword gives me almost instant access to publish to Medium via the publish button, but I want to control my own domain, and I want a git archive of my blog posts too, so what do I do?

I really liked the way that tagging used to work on my site, but have decided that the value add to getting it working again is too low, given the time that it might take me to work around the issue. I thought briefly about moving to a python static site generation tool, but that would involve so much work that it would defeat the purpose of doing what I want to do, which is to blog fairly efficiently.

In the end I decided to change my tagging strategy, and create some static tag templates. This post from Mike Apted was easy to follow along and get working.

This comes to the nub of my problem with software. I want it to serve me, and mostly to get out of my way, but by knowing just enough to have a little bit of control of my environment I often get seduced by the desire to have perfect control. I just need to step back a little, and ask, it what I want to do here worth the potential time and effort that it might take me to complete, in contrast to finding a solution that is good enough and meets most of my needs.

Covalent Data, first impressions

Covalent Data is a tool to search over research topics. It seems to have the following features:

  • Is a DB of grants, papers, people and institutions.
  • claims to use machine learning to tie these entities together.
  • search results don’t seem to have a way to be exported, so for example though the grant awards results do list the amount of each grant, to get a total amount against a search term, you would need to do the work of extracting each data point manually.
  • is hard to determine what their sources are, specifically what data bases are not being covered?
  • for multi word searches the engine prefers to return “near results” than exact matches, it was quite fiddly to force exact matching in the search interface, and I’m not convinced I was able to actually get it to work.

Some example searches:

search term “Digital Humanities”. 16 results found
All results are from NSF.

search term “computational social science”. 31 grants
30 publications
Again, all NSF funded grants

“social science” + “big data”. 90 grants. 24 publications

search term “stem cells”. 76k results.

In contrast The NSF grant search tool does allow a download of results, it found the following:

search term “computational social science”. 13 grants

search term “digital humanities”. 23 grants.

search term “stem cells”. 632 grants.

I was puzzeled as to how covalent data found so many more grants for stem cells, when both NSF and NIH reported far fewer.

Overall I’d like to see more grant agency coverage, more clarity around how results are generated, and an export ability. I could see covalent data becoming a useful tool at some point in the future, expecially if I felt that it was taking away the pain of having to go to many different sources to find grant funding information. Right now I’m not sure I trust it, and the results, as returned, are a bit hard to work with for onward analysis.

Goodbye eLife!

# Goodbye eLife!

So after nearly four and a half years I am moving on from eLife. I’ve had an amazing time, worked with some amazing people, and we have gotten a few really nice things done.

First off, we are hiring a replacement for my role, this is an amazing opportunity to effect real change in scholarly publishing. eLife has just announced follow on funding of £25M to sustain us through to 2022. We have a great dev team, we have the buy-in from a hugely respected editorial board, and our submissions are going from strength to strength. Being open, and making the software we build open source is baked in to our culture. If you are excited by the possibilities, then do think of applying!

Secondly I’d like to cover the reason why I’m leaving. Eight weeks ago my wife delivered our beautiful little daughter Laira, and now we are jugging two little ones at home. I live in London and commute most days up to Cambridge to work at eLife. Though I was not in the market for a new job, when an opportunity came up that was just around the corner from where I live, it was something I had to think seriously about. After thinking deeply about it, I made the decision that this new opportunity was both sufficiently exciting and could give me the ability to support my family in a way that is just not possible with a commute to Cambridge as part of my daily routine. I am a strong believer in putting family first in these decisions. Life is not a rehearsal for some time when we will get to do it all again in the future, but better. It’s also not lost on me that when I moved from Mendeley to eLife, that was driven by the upcoming birth of our son. I have been extraordinarily lucky to have had such great opportunities that allow me to support my family in this way, while at the same time allow me to pursue work that is exiting and impactful.

Finally I’d like to look back over my time at eLife and give a personal reflection on what we have achieved. If I count it correctly I think I was the eighth person to join the team (not including our amazing editor in chief Randy). At the time of joining I wrote "With eLife I'm convinced there is an opportunity to make a contribution and an impact too. It's in front of us now, and we have the opportunity to do something great." Well four years on, I think we have done something great. When I joined eLife was literally a blank sheet of paper. One of the first things we did, even before launching our journal platform, was to start attracting submissions, and when they had been submitted posting them directly to PubMedCentral with no delay. At the time this caused fury in some areas of the publishing world, but it was the right thing to do for the researchers, and for science, and I think that kind of set out our marker that we were not simply interested in doing things the same old way.

Those manuscripts went through the eLife peer review process, a collaborative process that almost totally eliminates the 3rd reviewer problem.. There are not a huge number of innovations happening in peer review at the moment, so I think what eLife has done here is really laudable, and it’s great to see if getting traction in some other journals.

Of course I was brought on board to deal with the technical development of the journal, and in December of 2012, with Highwire as a partner, we launched the eLife journal website on an incredibly ambitious timescale. At the time it was widely lauded as having the clearest layout for a scholarly article page of any journal on the web. We brought videos inline, made a good effort of no longer abandoning supplementary materials to the ghetto of the page footer, and made it far easier to see related images than had ever been done before. A lot of that was due to working with a design agency that had no previous experience in scholarly space – http://ripe.com, and tackling the problem as a straight up UX problem, rather than a problem specific to research. Hand in hand with that went a focus on getting as much value as possible into the XML, including funding information and well structured contribution information for authors. We have posted our sample XML into github, and we push the XML of all of our journal articles into github, along with some nice tools for parsing them.

One of the most fun products that we worked on was eLife Lens. We did this in collaboration with Ivan Grubisic who had the idea, and the http://substance.io team, who added a great deal of coding and thinking muscle to the project. Lens has gone on to be adopted by a number of other publishers, and I’m exited about it’s future.

Another great highlight of my time at eLife was when Randy won the Nobel prize in 2013. Amazingly we were one of the first people he called, even before the news had been press released. The traffic to the journal got a nice spike that week.

Earlier this year we hit another big milestone, and we took over full hosting of our own content, along with developing the production system behind the scenes that powered that. We are going to open source eLife Continuum in the next couple of weeks, and that’s one of my big remaining jobs to get done before I move on.

All of this technical development happens in support of science, and the development teams’ mission in eLife is

	To build a platform for research communication that embodies the best practices of open development and that treats its users with respect. 

It’s been incredible to see the support that eLife has from the scientific community. We are now getting well over 600 submissions per month, and the quality of the research that we are publishing is fantastic, from editing RNA, to work on the Zika virus right through to the discovery of new species of Homo. The papers we publish make their data open, and their reviews open too. It’s just fantastic to see the scholarly community embracing such transparency, and it’s making a real impact on improving the way science is done.

On a more personal level, all of these things were achieved through the dedication and hard work of a large number of people. I had the great pleasure of working with an amazing team, and also working with some amazing partners over the last four years. When I was evaluating joining eLife I got in touch with a mutual friend who had worked for my boss Mark. He said that Mark was the best boss he had ever had, and if I ever got a chance to work for him, I should jump at that chance. Well, jump I did, and I can very gladly report the same. I’ve learnt so much working for Mark, and I hope to carry some of that with me in my future career.

The technical team that we have built up over the last four years are amazing, and I’ll definitely miss working with David, Nathan, Sian, Chris, Luke and Giorgio. I’ll miss the great interactions with the planning team, the support from eLife, and the general sense of camaraderie within the office.

I am so incredibly proud and humbled to have been a part of this initial journey for eLife. Knowing what I do about what is coming down the line, I’m just really excited for the future of the journal and the continued future impact that eLife is going to have. I feel that it couldn’t be in a stronger position right now, and I wish whoever comes in to shape the role I’m leaving as much fun and enjoyment as I’ve had over the past four years.