Import PDF & convert to Kotahi's HTML profile
Description; the purpose of this task is to support the import of PDFs into Kotahi through integration with Sciencebeam. Sciencebeam supports the conversion of PDFs to XML. We require conversion of PDF to HTML (Kotahi HTML profile specifically).
Suggested solution; XSweet accepts docx, but the remaining pipelines support HTML. Convert PDF to HTML and then feed the output through XSweet for the doc clean-up; PDF -> TEI-XML -> Docx -> XSweet -> Wax
Acceptance criteria;
- Ensure HTML is accessible in Wax.
- Extract manuscript metadata and populate the submission form i.e. title, abstract and/or author name data.