Skip to content

GitLab

  • Menu
Projects Groups Snippets
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • XSweet XSweet
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 47
    • Issues 47
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 1
    • Merge requests 1
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • XSweet
  • XSweetXSweet
  • Issues
  • #159

Closed
Open
Created Feb 11, 2019 by Alex Theg@athegOwner

Word and Zotero citation formats

The two citation formats that would be good to capture are:

  1. References added with Word's own References tool
  2. Zotero references in Word

This ticket describes the format of both kinds of references, and we should decide on a target output format.

MSWord References

References added from within recent versions of Word use the following format:

Inline reference callouts from word/document.xml

The visible text in a Word document looks like this. The parenthetical is an inserted citation.

citation_screen

Users choose one reference style for the document (APA, Chicago, etc.), which determines the format both of the bibliography and the inline reference callouts.

This is the OOXML for a citation is as follows:

<w:sdt>
  <w:sdtPr><w:id w:val="836273690"/><w:citation/></w:sdtPr>
  <w:sdtContent>
    <w:r><w:fldChar w:fldCharType="begin"/></w:r>
    <w:r>
      <w:instrText xml:space="preserve">CITATION Sam02 \l 1033
      </w:instrText>
    </w:r>
    <w:r><w:fldChar w:fldCharType="separate"/></w:r>
    <w:r>
      <w:rPr><w:noProof/></w:rPr>
      <w:t>(Paul, 2002)</w:t>
    </w:r>
    <w:r><w:fldChar w:fldCharType="end"/></w:r>
  </w:sdtContent>
</w:sdt>

I believe the Sam02 tag in this block - <w:instrText xml:space="preserve">CITATION Sam02 \l 1033</w:instrText> - is used to connect the callout to the corresponding full citation & data.

The above text is from the word/document.xml file. The full citation with all its data generally lives in a file called customXML/item1.xml. There's a bit more to it than that, but we'll consider it true for the purpose of this ticket.

Full citation data

Citation data, stored in a customXML file, takes the general format of the following example:

<b:Sources SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">
  <b:Source>
    <b:Tag>Sam02</b:Tag>
    <b:SourceType>Book</b:SourceType>
    <b:Guid>{A7E1436C-B7FE-2441-A039-CD8DA09E5981}</b:Guid>
    <b:Author>
      <b:Author>
        <b:NameList>
          <b:Person>
            <b:Last>Paul</b:Last>
            <b:First>Sam</b:First>
            <b:Middle>Jim</b:Middle>
          </b:Person>
        </b:NameList>
      </b:Author>
      <b:Editor>
        <b:NameList>
          <b:Person>
            <b:Last>Zimms</b:Last>
            <b:First>Steven</b:First>
            <b:Middle>Carl</b:Middle>
          </b:Person>
        </b:NameList>
      </b:Editor>
      <b:Translator>
        <b:NameList>
          <b:Person>
            <b:Last>Slims</b:Last>
            <b:First>Handy</b:First>
          </b:Person>
        </b:NameList>
      </b:Translator>
    </b:Author>
    <b:Title>Cell</b:Title>
    <b:City>Davis</b:City>
    <b:StateProvince>CA</b:StateProvince>
    <b:CountryRegion>USA</b:CountryRegion>
    <b:Publisher>Avid</b:Publisher>
    <b:Year>2002</b:Year>
    <b:Volume>1</b:Volume>
    <b:NumberVolumes>1</b:NumberVolumes>
    <b:Pages>107</b:Pages>
    <b:ShortTitle>C</b:ShortTitle>
    <b:StandardNumber>2</b:StandardNumber>
    <b:Edition>Fourth</b:Edition>
    <b:Comments>Bob Loblaw's Law Blog</b:Comments>
    <b:RefOrder>1</b:RefOrder>
  </b:Source>
  ...
</b:Sources>

All of the references data lives inside a <b:Sources> tag. The <Sources> tag specifies namespaces and the Word document's specified bibliographic style. The chosen references style doesn't affect the format or content of the underlying source data.

<b:Sources
  SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA"
  xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
  xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
>

or, as another example:

<b:Sources
  SelectedStyle="/CHICAGO.XSL" StyleName="Chicago"
  ...continues as above

There are many different citation types available in Word, which share some of the same data tags but also have some of their own.

MS Word reference bibliographies

Authors can choose to insert an automatically-generated bibliography into the Word document, based off of the references cited. As XSweet stands currently, this bibliography does not get extracted and is removed from the HTML. However, it should be relatively simple to ensure that the bibliography is extracted and passed throughas text into the HTML.

Zotero citations

Zotero citations, inserted into Word documents via the Zotero plugin, can be set to appear as either endnotes or footnotes. They come out in the HTML exactly like any other endnotes and footnotes, with no special class or other identifiers. If there are existing, non-Zotero citation endnotes or footnotes in the .docx, the Zotero citations will mix in with them.

Zotero manages the text and format of the inserted references, and will also generate a bibliography from them. If a bibliography is inserted, it is extracted into the HTML as a simple series of <p class="Bibliography" ...> paragraphs, one for each source.

So unlike MS Word references, Zotero references embedded in Word are not tagged semantically.

Edited Feb 12, 2019 by Alex Theg
Assignee
Assign to
Time tracking