RFC: Generic job/message queue in PubSweet
It's become clear that we need a mechanism for long running jobs and messaging (in the programming sense, #356) in PubSweet.
The things that need it are e.g.: document conversion, sending a number of emails, semantic extraction, notifying interested processes about data updates.
As messaging queues (as described in #356) and job queues are for in many aspects the same thing, I propose we merge them and address at the same time, with the same tech stack.
The overall architecture would be very simple and has three parts:
- message producers,
- queue,
- message processors.
For example:
- A message producer could be somewhere in the business logic for accepting a manuscript: when a manuscript is accepted, the message producer adds a message
{type: 'manuscript-accepted', manuscriptId: 'someUuid'}
to themanuscript
queue. - A queue receives this and notifies any listening message processors
- A message processor, listening to all messages on the
manuscript
, is notified, and proceeds to process the message according to the data it contains (in this case, for example, it sends an acceptance email to the authors of the manuscript).
There are a number of features that this system must have:
- Capability for multiple nodes (distributed)
- Retrying failed jobs
- Progress reporting
- Delaying jobs to occur at an opportune time
Message producers
The required functionality here is straightforward: there needs to be a way for message producers to simply add/send/submit a JSON message to a queue with a name:
const queue = new Queue('queue-name')
queue.send({message: 'hello'})
Queue
The way to address this usecase, in the technical sense, is the introduction of a job queue (https://github.com/Automattic/kue, https://github.com/OptimalBits/bull), but this almost invariably brings additional service dependencies (such as Redis), which is what we'd like to avoid, if possible, as it makes development, debugging and deployment trickier.
Fortunately, Postgres, the one service we already depend on, also supports a proper distributed messaging queue (https://blog.2ndquadrant.com/what-is-select-skip-locked-for-in-postgresql-9-5/). Until recently, there hasn't been a mature solution using this mechanism in Node.js, but a new package appeared on my radar (pg-boss
) and a 3.0 release is around the corner: https://github.com/timgit/pg-boss/issues/79. Using Postgres and pg-boss
as the technology backing our queue would keep our stack thin while, at our scale, not losing anything. At super scales like eBay, Facebook, etc, this would not work, and we'd be better off with a dedicated service for queueing (Redis or RabbitMQ) but I've yet to see a Postgres database reach a real limit in scalability in any normal project.
In terms of features, pg-boss
matches what we need, with two exceptions:
- It polls the database (once per second by default) instead of LISTENs/NOTIFYs (Postgres' pubsub). The author seems to have good reasons (https://github.com/timgit/pg-boss/issues/24), but this is something that we could look into and improve upstream, as there are solutions (in other languages) that use this combination out there (https://github.com/treycucco/pypgq, https://github.com/mbreit/pg_jobs, https://github.com/gavinwahl/django-postgres-queue)
- It doesn't have progress reporting, but the author suggests splitting jobs into multiple parts and then using job completion as a way to measure progress. Again this is something we could look into and improve, as other solutions out there do have built-in progress reporting.
Message processors
This is last bit of the system, but also an interesting one. An example of a message could be a document conversion job. For example, a message could say that file X.docx needs to be converted to HTML. A message processor in this case could be a process that runs XSweet. XSweet depends on Java and Saxon to run. How do we manage these dependencies sanely in development and in production? My proposal is that we allow message processor to be represented by Docker containers, containers that have all of the required dependencies, and contain e.g. a runner.js
that responds to the appropriate messages in the queue. Having processors in Docker containers is optional (you can also respond to messages from the same Node.js process), but should be fully supported, as it really widens the spectrum of easily handled jobs.
const queue = new Queue('queue-name')
queue.subscribe(message => {
processMessage(message)
})
Alternatives
There are many, as mentioned, and most rely on Redis. I think we should try to address our needed functionality within the scope described above, and if we can't, start discussing alternatives.
Wrap-up
I believe a system like proposed above could be simple and elegant, while adding a lot of functionality that production systems using PubSweet absolutely need.