fix(imports): discard duplicate-DOI papers during colab import #1594
See #1594 (closed).
I now check for duplicates by DOI. I've also rationalised the code a little better by extracting functions to make the flow more understandable; and changed the order of processing a little so we don't store unnecessarily large numbers of manuscripts in memory, only to discard most of them.