Partially Attended

an irregularly updated blog by Ian Mulvany

blog posts about data

Using Datasette to publish data

In my department we have started to make a bit of space to allow for self learning to happen. I took the time to look at https://datasette.readthedocs.io/en/stable/, an ecosystem of tools that support data publishing. These tools are from Simon Willison, and they are fantastic. I’d been meaning to look at the them for some time now. I used a jupyter notebook to work my way around getting some data together, and working with the tool. ... (more)

STM Research data workshop.

The start of December is always a busy time for news in the STM / Product space. There is the annual STM meeting in London, and AWS re-invent also kicks off this week. As a result, within just a few days, I find that I have more things to write about than I can ever possibly have time to get through before the end of the year, we must plough on, and plough on we will. ... (more)

STM Research data workshop.

The start of December is always a busy time for news in the STM / Product space. There is the annual STM meeting in London, and AWS re-invent also kicks off this week. As a result, within just a few days, I find that I have more things to write about than I can ever possibly have time to get through before the end of the year, we must plough on, and plough on we will. ... (more)

Google data set search.

I’ve just got back from a fantastic workshop looking at infrastructure for research data discovery. I’ll blog about the workshop in due course, but I was asked to comment about Google Dataset Search - Dataset Search. I had the change to meet with Natasha Noi from Google who is behind the service. Natasha Noy – Google AI. As with many google services, it has been created by a small team, but with the underlying web scale infrastructure of Google to build on top of. ... (more)

Belmont Forum Round Table - data accessibility statements

Yesterday I attended a round table discussion hosted by the Belmont Forum about the release of their position on data accessibility statements and digital objects management plans. (It’s a bit of a mouthful, but the reason is that they are aiming to be clear and comprehensive around what they are asking to make it easier for researchers, publishers and other stakeholders to get to compliance around this policy.) You can read their position paper — Draft DAS Statement and Policy for October 2018 Plenary - Google Docs. ... (more)

Internet Archive, Code for Science and Society, and California Digital Library to Partner on a Data Sharing and Preservation Pilot Project

Some major players are getting together to trial decentralised data sharing using the Dat protocol - https://datproject.org. Dat for me is one of the dark horses of the infrastructure landscape. It has great power, some amazing developers, has already created some great value, and is still not known by many people yet. Also, this is not blockchain based, and yet manages to be decentralised. Those of you who know me will know why that pleases me. ... (more)

A new new way to make biological data more easily citable

Working under Force11 the California digital library and the EBI have made progress on making identifiers to data held in biological data repositories more easily resolvable. The have done this by setting up infrastructure and standards to support creating globally unique prefixes for the data repositories involved. You can see some examples in this table: Table 2: Examples of persistent, citable URLs for a single accession (NCBI Taxon 9606), with default and specified providers. ... (more)

thoughts on the ERC data workshop

On Thursday and Friday of last week I attended a European Research Council workshop on managing research data. It was well attended with about 130 participants brining views from across the academic disciplines. I’ve blogged my raw notes from day one and day two. In this post I reflect on the points I noticed that were raised over the two days. People have been talking about the increasing importance of research information for many years now, and a hope was raised in the opening comments that we might be able to provide solutions to the problems posed by the issues of research data, by the end of the workshop. ... (more)

ERC data management workshop, day 1

initial thoughts about the workshop. Opening remarks. Setting the scene. Sabrina Leonelli - the epistemology of data-intesive science. [Dr Hans Pfeiffenberger - Open Science – opportunities, challenges … @datasciencefeed.](#dr-hans-pfeiffenberger-open-science-opportunities-challenges-datasciencefeedhttpstwittercomdatasciencefeed) Bernd Pulverer - finding and accessing the data behind figures. Dr Roar Skålin - Norwegian researchers want to share, but are afraid of jeopardising their career. Summary of points from the scene setting. Afternoon breakout session - Life Sciences. ... (more)

ERC data management workshop, day 2

Life sciences breakout - key points. Physical sciences breakout - key points. Humanities breakout - key points. Open discussion on morning presentations. Breakout session on incentives. [Paul Ayris - Implementing the Future: the LERU roadmap for research data.](#paul-ayris-implementing-the-future-the-leruhttpleruorg-roadmap-for-research-data) Sünje Dallmeier‐Tiessen - Incentives for Open Science Attribution, Recognition, Collaboration. Veerle Van den Eynden and Libby Bishop - Incentives for sharing research data, evidence from an EU study. Open discussion after breakout session. ... (more)

STM innovations seminar, London, 2013

Today I’m at the STM innovations seminar. The twitter tag for today is #ukinno. The program is online. I’m going to take a light approach to blogging today, I’ll probably hang out mostly on Twitter. ## 9.35 The Research Data Revolution, Sayeed Choudhury, Associate Dean for Research Data Management, Johns Hopkins University  Data has become a major topic of interest from all sectors of society with headlines such as “Data is the new oil” to assertions from McKinsey that data is the fourth factor of production. ... (more)

ENCODE - an example of open publication and data integration.

On Monday the 14th of January we met at the PLOS offices in Cambridge to hear a talk from Euan Birney on lessons learned from publishing data rich publications though the encode project. This was the first time that Euan was far less worried about the print, and far more worried about how well the online version was going to work. Dimensions of the project 5 TeraBases 1715 times the size of the Human Genome 3k experiments 410 authors on the main paper 6 high profile papers ~35 companion papers The output should not be thought of as papers, but as the raw data. ... (more)

Product Tank 6, big data

I’m at PridcutTank 6. Todays topic is big data. Hether Savory - Open Data User Group She is the chair of the open data user group. She is using a recipe analogy for the use of data in the creation of products - data is an ingredient, but you have to get the ingredients right. (There are a number of people in the audience who work with public sector data, and more people who work with open data, but most people know about this, but they don’t use it). ... (more)

Data literature integration workshop.

Literature Data integration workshop 2012-10-10 There are many people attending today. I’m not sure whether the attendee list will be released. I’ve taken notes, but rather rapidly. I may have ended up mis appropriating comments, missed comments, or inserted comments that didn’t happen, so take the below with the appropriate health warnings. I still need to check on the links, and pull together a link list at the bottom, but I wanted to get the write up out while it was still in front of me. ... (more)

example title

Back in 2010 I was pointed at zanran, a search engine that looks for data in graphs. I’ve been playing around with some sample searches, and decided to try drowning deaths swimming pool. Incidentally all of the results that came back were figures for young children. I guess this is because that is the group where there is the most concern, and so there is more published data about this group. ... (more)

mini-review of "Measuring the User Experience on a Large Scale"

Measuring the User Experience on a Large Scale: User-Centered Metrics for Web Applications by Kerry Rodden, Hilary Hutchinson, and Xin Fu Rodden, Hutchinson and Fu describe a framework for measuring the user experience of web apps through mining server logs. Since they are based at Google one assumes that the framework that they are describing has been battle tested. They focus on metrics that measure user centric aspects of the online experience, in contrast to business centric (such as PULSE metrics: Page-views, Uptime, Latency, Seven day actives, Earnings). ... (more)

Cancelled :( Drinks with Chris Wiggins.

CANCELLED Due to flight cancellations out of NY, Chris won’t be making it over to this side of the pond, so sadly we will be calling the event off. London is a great city to live in if you are interested in technology. It has to be one of the best cities in the world for people working on the interface between science, tech and te web. Tonight I’ll be going to the awesome Same As Christmass Quiz, and tomorrow night Chris Wiggins will be passing through from New York, and there will be a chance to meet up with him for a pint and a chat. ... (more)