advice on publishing research online

in publishing, science, software carpentry

I have posted this post as a comment on the thread over at software carpentry in answer to the question What do we teach about writing/publishing papers in a webby world?

I ended up writing a bit more than I expected, so here are the main peices of advice:

- use a reference management tool
- try to find the fastest venue to publish in
- try to publish in an OA journal
- have a look for the best preprint server for your discipline, and add your work there too (might be a university archive)
- add as much supporting material as you can to the right locations, e.g. github for code, figshare for anything, vimeo or you tube for videos
- do register for an ORCID and add your newly minted publication to your ORCID profile
- don’t be afraid to screw around with copyright transfer statements
- use version control for your own sanity
- remember that all the time you spend pretty formatting your paper will be ignored and thrown away by large publishing companies, especially the work you do on reference formatting, so don’t do it
- if the collaborative environment of your choice is not working for the group, be pragmatic, drop it, get the damn paper finished already

I would start by advising people to keep in mind the goals of publishing. You want to get your work out into a venue that will be respected by your peers, and noticed by them. In most cases - but not all cases - this will be a journal published by one of the large STM publishers. Elsevier, Springer, Wiley, Taylor & Francis, PLOS and Sage represent a very large part of that market.

You want this process to happen as quickly as possible. Aside from the act of writing, and constructing your story, the act of publishing - getting it onto the web - is pure schlep. Every minute longer that you spend in this process is a minute wasted, as it’s not adding value to your research or your ability to put yourself in the position of being able to get the resources you need to do the research you are interested in.

Your first priority is to understand the most appropriate venue and then understand the system that this venue uses to get the work online. Tailor your process to lower the friction between the artefact you create and the process that will be used to get it online.

The great failure of my industry in the face of the web has been to make allow this process to remain orders of magnitude harder than publishing a post on blogger or wordpress.

I’ll step through some advice covering these topics now.

The most appropriate venue

Ask your colleagues, confer with your coauthors, it’s usually not hard to determine. A tool like the Journal author name estimator has been around for years and it can suggest a journal based on the text of your abstract. In addition the following resources can also help Journal Finder,, and Most of these are for the life sciecnes.

If your publication is an OA publication the Eigenfactor Journal Rank tool will tell you if you are getting good value for money. This ranks cost of the article processing fee against a rank of the journal determined by their own algorithm.

## Speed of publication

It might be worth checking if there is an alternative venue that might be a lot faster than your first choice.

A common approach is to submit to a high profile journal, and on rejection submit to PLOS one. This is done in order to reduce the thrashing around within the peer review system. Perhaps consider submitting to PLOS one first? You could also look for a journal that is smaller, and might be more responsive. In the life sciences the journal I work for - eLife - is both prestigious and fast.

For the life sciences Anna Sharman has a great resrouce for a selection of journals giving information about decision times, OA charges and journal metrics.

It might be interesting to encourage people attending your courses to contribute to these, or to create similar resources for their own disciplines.

Preprint servers / archives

Your discipline may have a discipline specific archive. Make sure a copy of your work is deposited there. If the full stext is deposited in one of these venues Google Scholar will be able to provide readers with a link to a full text version of your article - even if you have had to publish in a paywalled journal.

Often you can get your work in draft up there before the peer review process is complete (if that’s considered Kosher in your field). This can give you priority on an idea, even before the idea has been formally reviewed.

Also, check with your university library and find out what archives they run, deposit there for the same reasons as above.

The OA advantage

Keeping control of your own content is a significant advantage that authors can derive from publishing in an OA journal. I’ll touch on that a bit later.

There is another advantage, and that’s the advantage of discoverability.

Currently - as of writing this post, the Google main search bot does not index content that is behind an academic paywall for users who do not have access. That means if you publish at an non paywalled venue more people have a chance to find your content.

Now most of your immediate peers will probably be able to access your content by virtue of having it in either the appropriate venue or in an appropriate repository, but it can’t hurt to make it even easier to find.

If your coauthors will not agree to publishing in an OA venue, you can always try to modify the copyright transfer agreement that the publishing company will ask you to sign.

You can follow these examples to allow you to retain the right to distribute the paper in any way that you see fit. This is the one piece of advice that I’m giving that might slow down the process of publication, but go on, you know you want to do it, don’t you?

What happens to my paper in a big publishing company, and why should I care?

During the reviewing stage a very badly formatted version of your article will be created to be sent to the reviewers of your article. If you have a preprint of your article available, that might even be an easier artefact for the reviewers to use, and it might speed up the review process, though I don’t have any evidence to suggest that it will.

If your manuscript is accepted for publication then it will be sent to a large typesetting company, where it will be digitally torn apart and converted to XML. All of the formatting that you do on figures, text and on the reference lists, will be thrown away. I’ll just say that again. All of the work and hours you spend carefully formatting your reference lists will be ignored as the content goes through an automated typesetting system. (That’s why at eLife we don’t have a proscriptive requirement on the format of the references that we get sent, we will take them in any format).

All of your specially chosen fonts, and special text alignment will be mostly ignored.

Depending on the state of the manuscript and the quality of the language in the manuscript it may be checked by a copy editor, either for internal journal style, or for the quality of the language. Much of this work is undertaken by highly educated graduates in developing countries, particularly India, the Philippines and increasingly China - globalisation in action.

Why is this? For the most part the systems that run our global publication infrastructure are old, many of them have code bases that are older than 20 years. Back in the day XML was the only reliable transfer format, and it remains the industry standard today. A slow evolution has been happening with the XML that publishers are using, and under the gentle pressure to deposit into PubMed and PubMedCentral most publishers and typesetters are starting to target one of the many dialects of the NLM DTD. This has become a de-facto standard in the industry, however no writing tools export natively to this format, and the DTD supports, and is designed for, archiving print material. One of the very many consequences of this is that code that is typeset in this DTD is usually typeset as dumb text. On the other hand it does allow a resource like PMC to archive millions of articles, from thousands of publishers, and provide a very fine grained search interface on top of all of this content. I’ll mention writing tools a little later.

In order to potentially reduce the time to review your manuscript, and in order to reduce your the time your manuscript takes in the copy editing / typesetting process the following things could help:

Remember, this is probably a lifestyle choice, my main advice is pick a tool that does not have too much lock in. I used to work at Mendeley and believe it to be as good as any tool out there.

## But wait! I want to do iPython, interactive, open data, virtual machines, 3D printed DNA dinosaur replication and what you have just told me sound like like I can’t do that - that sucks :(

Yes, yes, it does suck, and I hear what you are saying, but remember, at the moment of publishing, your priority is to get the damn work published, and unfortunately that still means interacting with a system that has changed little since the late 17th century. There are moves in the right direction, oaises of sanity, but there is a long long way to go.

If you feel really passionate about this then the best thing you can do is to keep the rights to your own work, get the paper out as a CC-BY paper in a boring old venue, and then do the kind of publication that you really want to on your own academic home page, and build your own audience around your work that way. In that case you want the boring route to take up as little time as possible.

You should also deposit artefacts of your paper in the best possible place for them. Code to a location like github. Videos to youtube or Vimeo. Images to flickr. Data to Figshare, DataDryad, Zenodo, or one of the very many other subject specific data repositories that may be appropriate for your field.

Try and keep your artefacts well organised, and backed up off of your machine. You can back a lot up to github as part of a git repo, but that’s not it’s main use case. You can use a service like EverNote, or get a licence for a research specific asset management tool like Projects or LabArchives.

The aim here is to reduce the friction in getting instances of these resources into the hands of others - if you believe that this is a critical part of doing research.

It can also to make it possible to recover this informaiton in the instance of losing your main machine. (I decomissioned my main machine last summer via cup of coffee).

For the purposes of archiving your work you should also check with your institution and library to see if they can provide support or systems. Librarians in many institutions are mustard keen to help, as it provides a way for them to prove value to the academy in a world in which library subscriptions are under extreme pressure. You may find yourself with the problem of having too many options - which is not a bad problem at all.

Authoring tools, and why does this all suck so much?

I noticed that there was some discussion in the thread about collaborative tools for authoring. Again, I’ll just stress, get the work published as soon as possible. This might mean sending a PDF of the article to a publishing house, or having to just send in a Word file.

On the other hand, there are a new generation of online tools emerging for writing, and also tools emerging for writing on the iPhone and iPad. I think we have more viable options now at our fingertips than at any time in the past. I don’t believe that there are any serious contenders yet ready to oust the Word/LaTeX duopoly, but it would not hurt to take some of the following for a test drive to help with the authoring experience. It’s too broad a topic to go into a detailed review of each one, I’ll leave an investigation of these tools as an exercise for the interested reader. The list below is just a smaple, there are a bunch of others out there.

The tool that I see emerging at some time on the horizon, and that I have a lot of excitement for, is the work on the substance reader and composer and eLife lens. What’s really nice about this is that to get started you can import NLM XML directly, or markdown via panodoc. It does a great job of separating the view, logic and control of the writing experience, and so it should also be possible to write directly in browser, and export to a publication ready format directly - but some work remains.

In my own ideal world you can submit an idea to a journal as part of a pull request to the publication, peer review takes place in some system similar to how we do code review today. On acceptance the full digital artefact is published instantly. The writing and collaboration happens in almost any tool that the user likes, modifications are synced via something like dropbox. In this world writing tools support offline, as well as online modes, and content logic and views can be assembled independantly. In my ideal world the source is open. We are a little bit away from that at the moment, but there is no doubt in my mind that we are moving in that direction. [this great post by plos] has some great insights discussing what the native format for publihsing on the web should be.

About this post.

As we are discussing publishing on the web, I thought it might be useful to describe the tools I used to write this post. The body of the text is stored on my machine as a plain text file, and I store all of these in one directory using nvALT to manage them. This directory is also held under a Dropbox account, and I can access the content from my iPhone through a variety of editors, but in this case I didn’t use any of these.

For writing this post I used WriteRoom for mac in distraction free mode. I often use SublimeText in distraction free mode too. For some shortcuts in formatting I used TextExpander. To format the links I write the post in markdown, and did the formatting in SublimeText. I previewed the post using Marked. I also used Marked to verify that all of the links were working, at the time of writing. In order to publish the post on my blog I posted it directly into a github repo using github pages to render the content. You can see the result at … . I used the GrabLinks bookmarklet to gather all of the links from this post to add in as a resources list at the end of this post.

Final thoughts

I realise that I have mostly been answering the question about what shlould people know about the world as it is now, and not so much about what tools or approahces we should advocate to make the world a better place, but I hope that we can have a clear view on what is bad, so that this can help people make pragmatic decisions about how to change things for the better.

# resources