[NCRC] Abstracts incorrect at import
Some articles are being imported (via the "Refresh" button) with garbled abstracts. We expect the abstract to be a string of text, and for many articles this is what we get. But for a substantial minority, we instead get an array of strings. E.g.:
[
"Recent studies conclude that the global coronavirus (COVID-19) pandemic decreased power sector CO",
" emissions globally and in the United States. In this paper, we analyze the statistical significance of CO",
" emissions reductions in the U.S. power sector from March through December 2020. We use Gaussian process (GP) regression to assess whether CO",
" emissions reductions would have occurred with reasonable probability in the absence of COVID-19 considering uncertainty due to factors unrelated to the pandemic and adjusting for weather, seasonality, and recent emissions trends. We find that monthly CO",
" emissions reductions are only statistically significant in April and May 2020 considering hypothesis tests at 5% significance levels. Separately, we consider the potential impact of COVID-19 on coal-fired power plant retirements through 2022. We find that only a small percentage of U.S. coal power plants are at risk of retirement due to a possible COVID-19-related sustained reduction in electricity demand and prices. We observe and anticipate a return to pre-COVID-19 CO",
" emissions in the U.S. power sector."
]
This comes from an article with the following abstract:
Recent studies conclude that the global coronavirus (COVID-19) pandemic decreased power sector CO2 emissions globally and in the United States. In this paper, we analyze the statistical significance of CO2 emissions reductions in the U.S. power sector from March through December 2020. We use Gaussian process (GP) regression to assess whether CO2 emissions reductions would have occurred with reasonable probability in the absence of COVID-19 considering uncertainty due to factors unrelated to the pandemic and adjusting for weather, seasonality, and recent emissions trends. We find that monthly CO2 emissions reductions are only statistically significant in April and May 2020 considering hypothesis tests at 5% significance levels. Separately, we consider the potential impact of COVID-19 on coal-fired power plant retirements through 2022. We find that only a small percentage of U.S. coal power plants are at risk of retirement due to a possible COVID-19-related sustained reduction in electricity demand and prices. We observe and anticipate a return to pre-COVID-19 CO2 emissions in the U.S. power sector.
Here it's breaking at every subscript '2'. For other manuscripts I've looked at, it breaks when it encounters italic text. Evidently, internal tags such as <i> or <sub> are not being read correctly and breaking the import. I'm not sure whether these articles were imported from pubmed or biorxiv.
Aside from missing sections of text, the affected abstracts still display OK in the on-screen Wax editor. It is possible that some topics won't be detected, if the topic keyphrases are missing. Also, I don't know what the ramifications will be at time of export/publishing.