RFC: JATS as a source of names for things and more

This isssue a 'move' from an email discussion with @MelissaHarrison and @BlueReZZ from eLife about the ways we could use JATS.

Here's how it's gone down so far (hope this is usefully formatted):

from Jure

So, help me think through this. I think there’s two things at play in terms of the relationship between xpub and JATS: 1. storing article content and all required metadata in JATS, 2. using JATS as a source of vocabulary for things.

I think what we agreed on the meeting was more of number 2 than number 1. For example, my concern and the reason I raised it as a discussion, was that across different xpub apps we shouldn’t have things that are semantically the same, named differently: a Paper, an Article, a ResearchPaper, a Manuscript etc. Naming things is hard and we have the luxury of a vocabulary of things, namely JATS, that has a high level of adoption in this space. So why not use it? In the example above, just call those things Article. That was my question at the meeting and I think the parties involved agreed that it makes sense.

As for number 1, representing data and metadata in actual JATS is a different matter. I think some organizations that we work with might be less set on using JATS throughout their workflow, as long as the export to it at any point is easy. In any case, I think number 1 is a different and broader discussion, but definitely one worth having.

What do you think?

from Melissa

I think there are 3 things:

storing article content and all required metadata in JATS
using JATS as a source of vocabulary for things
How do we define the data model for actions, processes, and notes that are not currently catered for in JATS.

So, I reveiwed eLife initial submission screen wireframes with Nick and there is a significant detail of information there that needs to be recorded (and potentially sent onwards) that does not fit the current JATS model. I've been talking to others in the publishing space who have long and good experience on modelling data about papers as well as modelling the "content" that is published. All of this is integral to publishing process but not necessarily to the published information.

I want to forward think so that any of the information we collect that we think may be published in the future is modelled somehow in JATS.

The issue of annotations is one of these - as peer review comments/documentation is becoming more and more open, and publishable, I think we should prepare for this future by modelling commenting into JATS, and we have interest from great XML minds to be involved in this, which we should capitalise on. But happy if we do that as an eLife rather than Coko thing.

from Paul

Seems sensible to me, and I agree there are 3 steps to this, each of which shouldn't delay the other. At the meeting last month, I was thinking the xPub community needed an Ubiquitous Language (as Eric Evans describes it) that would extend beyond the developer/user relationship - the naming in JATS is a great starting point for that. We should then extend our ubiquitous language for areas not covered by JATS and that can form the basis of updates to JATS in the future. Presumably, that's when the minutiae of the detail will be debated.

It feels like we need something more concrete before getting too many others involved, so the approach of setting a community standard for these new bits first, then having that emerge as a de facto standard seems sensible. You're right about the slow movement though, we shouldn't wait too long. The priority should be that eLife, Hindawi, Collabra, Wormbase and EuropePMC all call a manuscript a manuscript when it's stored, and know the synonyms used for display too. I've developed ubiquitous languages before (usually as structured wiki pages) and it really helps.

I'll bring this up at the next YLD/eLife workshop tomorrow as there will be talk about data models.

With this preamble, I open the floor for discussion!