Word and Zotero citation formats
The two citation formats that would be good to capture are:
- References added with Word's own References tool
- Zotero references in Word
This ticket describes the format of both kinds of references, and we should decide on a target output format.
MSWord References
References added from within recent versions of Word use the following format:
word/document.xml
Inline reference callouts from The visible text in a Word document looks like this. The parenthetical is an inserted citation.
Users choose one reference style for the document (APA, Chicago, etc.), which determines the format both of the bibliography and the inline reference callouts.
This is the OOXML for a citation is as follows:
<w:sdt>
<w:sdtPr><w:id w:val="836273690"/><w:citation/></w:sdtPr>
<w:sdtContent>
<w:r><w:fldChar w:fldCharType="begin"/></w:r>
<w:r>
<w:instrText xml:space="preserve">CITATION Sam02 \l 1033
</w:instrText>
</w:r>
<w:r><w:fldChar w:fldCharType="separate"/></w:r>
<w:r>
<w:rPr><w:noProof/></w:rPr>
<w:t>(Paul, 2002)</w:t>
</w:r>
<w:r><w:fldChar w:fldCharType="end"/></w:r>
</w:sdtContent>
</w:sdt>
I believe the Sam02
tag in this block - <w:instrText xml:space="preserve">CITATION Sam02 \l 1033</w:instrText>
- is used to connect the callout to the corresponding full citation & data.
The above text is from the word/document.xml
file. The full citation with all its data generally lives in a file called customXML/item1.xml
. There's a bit more to it than that, but we'll consider it true for the purpose of this ticket.
Full citation data
Citation data, stored in a customXML
file, takes the general format of the following example:
<b:Sources SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA" xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography" xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography">
<b:Source>
<b:Tag>Sam02</b:Tag>
<b:SourceType>Book</b:SourceType>
<b:Guid>{A7E1436C-B7FE-2441-A039-CD8DA09E5981}</b:Guid>
<b:Author>
<b:Author>
<b:NameList>
<b:Person>
<b:Last>Paul</b:Last>
<b:First>Sam</b:First>
<b:Middle>Jim</b:Middle>
</b:Person>
</b:NameList>
</b:Author>
<b:Editor>
<b:NameList>
<b:Person>
<b:Last>Zimms</b:Last>
<b:First>Steven</b:First>
<b:Middle>Carl</b:Middle>
</b:Person>
</b:NameList>
</b:Editor>
<b:Translator>
<b:NameList>
<b:Person>
<b:Last>Slims</b:Last>
<b:First>Handy</b:First>
</b:Person>
</b:NameList>
</b:Translator>
</b:Author>
<b:Title>Cell</b:Title>
<b:City>Davis</b:City>
<b:StateProvince>CA</b:StateProvince>
<b:CountryRegion>USA</b:CountryRegion>
<b:Publisher>Avid</b:Publisher>
<b:Year>2002</b:Year>
<b:Volume>1</b:Volume>
<b:NumberVolumes>1</b:NumberVolumes>
<b:Pages>107</b:Pages>
<b:ShortTitle>C</b:ShortTitle>
<b:StandardNumber>2</b:StandardNumber>
<b:Edition>Fourth</b:Edition>
<b:Comments>Bob Loblaw's Law Blog</b:Comments>
<b:RefOrder>1</b:RefOrder>
</b:Source>
...
</b:Sources>
All of the references data lives inside a <b:Sources>
tag. The <Sources>
tag specifies namespaces and the Word document's specified bibliographic style. The chosen references style doesn't affect the format or content of the underlying source data.
<b:Sources
SelectedStyle="/APASixthEditionOfficeOnline.xsl" StyleName="APA"
xmlns:b="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/bibliography"
>
or, as another example:
<b:Sources
SelectedStyle="/CHICAGO.XSL" StyleName="Chicago"
...continues as above
There are many different citation types available in Word, which share some of the same data tags but also have some of their own.
MS Word reference bibliographies
Authors can choose to insert an automatically-generated bibliography into the Word document, based off of the references cited. As XSweet stands currently, this bibliography does not get extracted and is removed from the HTML. However, it should be relatively simple to ensure that the bibliography is extracted and passed throughas text into the HTML.
Zotero citations
Zotero citations, inserted into Word documents via the Zotero plugin, can be set to appear as either endnotes or footnotes. They come out in the HTML exactly like any other endnotes and footnotes, with no special class or other identifiers. If there are existing, non-Zotero citation endnotes or footnotes in the .docx, the Zotero citations will mix in with them.
Zotero manages the text and format of the inserted references, and will also generate a bibliography from them. If a bibliography is inserted, it is extracted into the HTML as a simple series of <p class="Bibliography" ...>
paragraphs, one for each source.
So unlike MS Word references, Zotero references embedded in Word are not tagged semantically.