Feature Proposal: Pagination improvements for Paged.js
This feature proposal describes several of the highest-impact features for development in Paged.js that would most improve the quality of the automatically generated paginated book exported by Editoria and Paged.js with the push of a button.
Paged.js is the tool that takes HTML from Editoria, applies CSS rules to that HTML, and generates a print-ready paginated version of the book. It is actively being developed, and as such, there are additional features to add to it that will to continue to improve the pagination. Right now, the automatically paginated books produced with Editoria and Paged.js still require some manual cleanups before they accurately reflect the standard designs used by University of California Press (UCP) and others. Building the features outlined in this proposal would result in the automatically paginated book representing the design specifications with greater fidelity, thus reducing the amount of manual cleanup required.
In addition, with an eye towards producing the best automatically generated paginated output possible as quickly as possible, Prince XML was explored to see whether it might be useful as a pagination tool while more features are beign added into Paged.js. However, it appears that the time and resources required to work with Prince XML would be better used improving Paged.js itself.
How Paged.js pagination works with Editoria now
At any time, all Editoria users are able to export and paginate the book in its current form using Paged.js:
- In the “EXPORT BOOK” interface in the top right of the Book Builder, any user selects “PagedJS” from the dropdown menu and press the “GO” button.
- Editoria assembles the contents of all of the components from the Book Builder into one single HTML file representing the entire book.
- Editoria sends this single HTML file to Paged.js.
- Before the book is paginated, some preprocessing tasks - cleanups, rearranging, etc. - are run on the HTML file.
- Finally, Paged.js takes the HTML input and creates a print-ready paginated version of the book, using CSS rules to style and lay out the HTML.
Paged.js features for development
Adding the following proposed features to Paged.js would result in a PDF that much more accurately reflects UCP’s needs and design specs, and would improve the quality of Editoria’s automated paginated output in general: pursue.
- Page balancing: Page balancing is ensuring that opposing left- and right-hand pages have the same or almost the same number of lines on them. This is not a feature that Paged.js currently has, and it requires manual balancing when there are imbalances. The number of lines on a page can be tweaked by very slightly changing the word spacing on the page (generally in the last paragraph) to add or remove lines and achieve the desired visual effect. Page balancing is a complex feature to build automatically, as it can be affected by and potentially affect other controls, such as widow and orphan controls, image placement, etc. pursue.
- Prevent the last word of a paragraph from being hyphenated. This should be a relatively easy control to add but it does not currently exist. pursue.
- Automated image placement adjustment: Currently, images simply appear in the paginated output wherever they are inserted in the text, and placement adjustments are done entirely manually. Adding to Paged.js the ability to specify that images should always be moved to the very top or very bottom of the page would be an important improvement. However, image placement would still need a human check to ensure that pictures appear close to where they’re referenced in the check, there aren't too many images on one page, images create a balanced page appearance, etc. pursue.
- Preserve formatting in notes: Currently, inline formatting in notes, such as italics, is dropped from the notes in the paginated output. This requires a fix. pursue.
- Include chapter number in chapter title headings in the endnotes backmatter: When Paged.js paginates a book from Editoria, it places all the endnotes from all the chapters into a single “Notes” backmatter component. The notes appear sequentially, each chapter's notes under a heading that displays the chapter's name. The chapter headings display the chapter titles but not the chapter number. These headings should be updated to include chapter numbering. E.g. the updated headings should read “3. What Susan Said” rather than simply “What Susan Said”. pursue.
- Hyphenation and PDF text search issues: when Paged.js is used on an operating system other than OSX (Mac), it causes hyphenation to fail, and also produces a PDF printed from the paginated book that is not text-searchable. These features work when Paged.js is run with a Mac. One solution is to set Paged.js up to be run on a server that doesn't have these issues, then have Editoria send the book contents to that server to be paginated and then returned, although that is something of a workaround. This is also apparently a known Chrome issue and a solution from Google may be in the works, but in the meantime a robust solution should be identified. pursue.
- (Longer-term) Footnotes: Add the ability to paginate notes as footnotes on the same page as their note callouts. This is a very complex feature.
A note on Prince XML
Prince XML is a tool for turning HTML into paginated print-ready PDFs, and It was explored to see whether it was a viable ready-to-use option for paginating books from Editoria, as it has some page layout functionality that is not yet built in Paged.js. It seems the cons outweigh the benefits of doing so. While it is certainly possible for an organization with a license for Prince XML and development resources to use Prince XML to paginate content from Editoria, it does not appear to make sense as a feature for the Editoria development team to pursue.
- Prince XML uses its own, non-standard, proprietary CSS pseudo-attributes. Before pagination could be used in any meaningful way, a template would have to be built from the ground up. The time it would take building UCP’s standard design template with Prince XML - essentially starting from scratch - would be better spent improving the existing CSS and Paged.js itself.
- Creating a Prince XML design template would require a developer with Prince XML expertise, familiar with the Prince-specific tricks and tools. This would be expensive and difficult initially, and would significantly limit the universe of people who could tweak or develop new stylesheets, as so many more people know plain CSS. Paged.js also provides a preview window to instantly see the result of changes to the CSS, making developing the CSS much more accessible to people than working in Prince XML.
- Further, as a Prince XML relies on Prince-specific features, any templates built for Prince pagination could only ever be used with Prince XML and couldn't be reused anywhere else. This would effectively lock Editoria users into using Prince XML. CSS built to work with Paged.js, on the other hand, is much more reusable, and could be used anywhere CSS could.
- Only those with the resources to buy one or more Prince XML licenses (relatively expensive) would be able to benefit from the any Prince XML pagination templates. Relying on a closed-source, proprietary, and paid tool is somewhat contradictory to the main goals of the Editoria project: to provide an open source, extensible, free book production tool.
Thus, rather than pursue this further, it would be more efficient to spend the time adding to and improving Paged.js itself.