Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: www_lsst_io
-
Labels:None
-
Story Points:3.5
-
Epic Link:
-
Team:SQuaRE
-
Urgent?:No
Description
These documents are examples of technotes not being correctly ingested:
- https://dmtn-153.lsst.io (LaTeX technote has no content, only an abstract)
- https://dmtn-152.lsst.io (LaTeX technote has no content, only an abstract)
- https://dmtn-150.lsst.io (reStructuredText technote)
- https://dmtn-146.lsst.io (LaTeX technote has no content, only an abstract)
- https://dmtn-139.lsst.io (no sectioning)
- https://dmtn-138.lsst.io (no sectioning)
- https://dmtn-134.lsst.io (no abstract— intentionally)
- https://dmtn-107.lsst.io -> actually fine
- https://dmtn-106.lsst.io
- https://dmtn-105.lsst.io
- https://dmtn-096.lsst.io -> no abstract, but has content.
- https://dmtn-078.lsst.io
- https://dmtn-076.lsst.io
- https://dmtn-075.lsst.io
- https://dmtn-074.lsst.io
- https://dmtn-072.lsst.io (LaTeX technote with no abstract?)
- https://dmtn-065.lsst.io
- https://dmtn-056.lsst.io
- https://dmtn-054.lsst.io
- https://dmtn-052.lsst.io
- https://dmtn-042.lsst.io
- https://dmtn-034.lsst.io
In this ticket we'll analyze these ingests, and devise fixes to ensure that these documents appear in the Algolia index. A large part of this work may be to create new diagnostic tools for the ook ingest pipeline.
While working on the ingest code, let's also see if its possible to "bin" LaTeX paragraphs together to result in fewer records.
We fixed many ingest issues:
Some documents are still posing issues: