Feature Proposal: Upload files using PubSweet’s job queue
This proposal would change how Editoria uploads MSWord (.docx) files into the Book Builder. Rather than sending .docx files to a standalone web service (INK) for conversion, the proposed update would allow Editoria to use its own backend (PubSweet) and a Docker container to do the same transformation internally.
This is a behind-the-scenes change that would not alter the upload interface or the XSweet docx-to-HTML conversion results. It would, however, make Editoria’s installation process easier, as users wouldn’t have to also separately install INK and configure Editoria to communicate with it. It would also improve Editoria’s stability and simplify its architecture, and it has the potential to shorten the upload time. Finally, it would make updating XSweet easier (specifically, swapping XSLT sheets into and out of the transformation pipeline).
Editoria uses XSweet to extract and transform the contents of .docx files into HTML that can be loaded in Editoria's Wax editor.
Right now, Editoria sends uploaded MSWord files to a completely separate app - INK - which runs the .docx files through XSweet and sends the converted results back to Editoria. INK runs the different steps of the XSweet pipeline - a series of XSLT files - as an INK “recipe” written in Ruby.
INK is essentially a job queue designed to manage tasks like this conversion. However, Editoria’s backend, the PubSweet server, has been upgraded and now has its own built-in job runner, as well as a PubSweet job component that runs XSweet in a Docker container (
pubsweet-component-job-xsweet). The PubSweet job runner would replace INK, and the specific job component
pubsweet-component-job-xsweet would replace the Ruby implementation of the XSweet pipeline.
Currently, the upload process looks like this:
- In Editoria, users select one or more .docx files to upload.
- For each .docx file, Editoria sends a separate HTTP POST request (as an XHR) to the INK server (a Ruby on Rails app external to PubSweet) and awaits a response.
- INK receives the .docx file and runs it through XSweet to produce the HTML to be loaded into Editoria. The different steps of the XSweet pipeline - a series of XSLT files - are chained together as an INK “recipe” written in Ruby.
- INK sends the result back to Editoria, which then loads the transformed .docx contents into a Book Builder component.
The INK server must be running independently from Editoria, or else the upload fails. Further, the INK endpoint for Editoria to send the request to needs to be manually specified in Editoria’s configs or
env variables during Editoria’s initial setup.
If Editoria is updated to upload .docx files using the PubSweet job runner instead of INK, the upload process would look similar to the above, but with some significant benefits:
- In Editoria, users select .docx files to upload.
- Each .docx is sent as an HTTP POST request to a preconfigured internal PubSweet endpoint, which registers the .docx conversion as a job to run with PubSweet’s job runner.
- Once the conversion is complete, the converted .docx is returned as HTML, completing the HTTP POST request.
A Docker container - rather than an INK server - needs to be running for this to work, but it is far simpler and less resource intensive than running a separate Ruby on Rails app (INK). Users wouldn’t have to install INK as part of the Editoria installation process, nor add configurations to Editoria so it knew where INK was installed.
This would also make it easier to swap XSLT files into or out of the XSweet conversion chain, as the order in which it is run would involve updating a simple script rather than making a new XSweet INK recipe as a Ruby gem.
Implementing this change would entail the following changes:
- Including the
pubsweet-component-job-xsweetmodule in Editoria
- Updating Editoria’s Docker files to run the job's Docker container, and ensure it can connect to the Editoria database.
- Updating how Editoria makes the HTTP POST request with the .docx, from sending it to INK, to sending it to the new endpoint for the job runner to pick up. The new request would look something like the
curlexample in the
Beyond this, the upload could be further improved by:
- Updating how the PS job runner returns the upload's results, from waiting on a lengthy HTTP POST request to using a GraphQL subscription.
- Compiling the XLST sheets in XSweet so they run faster, and combining some of the XSweet sheets into larger files.