Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10961

Add metasrc function to clean LaTeX source (remove comments, insert included documents)

    Details

      Description

      As a pre-processing step before extracting other LaTeX commands, metasrc should filter the latex source:

      • remove comments
      • add input/included tex files.
      • replace simple macros (those made with \def and \newcommand).

        Attachments

          Issue Links

            Activity

            Hide
            jsick Jonathan Sick added a comment -

            Tim Jenness: can you review this proposed update to Lander that improves metadata abstraction from TeX documents (including the title, abstract and authors)?

            There are two PRs:

            1. Update to metasrc, with the bulk of the LaTeX processing code. https://github.com/lsst-sqre/metasrc/pull/3
            2. Update to Lander to use the new metasrc APIs. Includes bug fixes for Travis PR builds too (I've made intermediate releases containing this code already). https://github.com/lsst-sqre/lander/pull/2

            Show
            jsick Jonathan Sick added a comment - Tim Jenness : can you review this proposed update to Lander that improves metadata abstraction from TeX documents (including the title, abstract and authors)? There are two PRs: 1. Update to metasrc, with the bulk of the LaTeX processing code. https://github.com/lsst-sqre/metasrc/pull/3 2. Update to Lander to use the new metasrc APIs. Includes bug fixes for Travis PR builds too (I've made intermediate releases containing this code already). https://github.com/lsst-sqre/lander/pull/2
            Hide
            tjenness Tim Jenness added a comment -

            As I say in the PR, this looks fine but I'm concerned that we are going down the "write a latex parser" rabbit hole and we might end up continually tweaking this. We have to keep an eye on it.

            Does this mean DM-10928 is now invalid? I'm not sure we have two tickets: one saying "improve the metadata extraction" should have been fine if that is the case.

            Show
            tjenness Tim Jenness added a comment - As I say in the PR, this looks fine but I'm concerned that we are going down the "write a latex parser" rabbit hole and we might end up continually tweaking this. We have to keep an eye on it. Does this mean DM-10928 is now invalid? I'm not sure we have two tickets: one saying "improve the metadata extraction" should have been fine if that is the case.
            Hide
            jsick Jonathan Sick added a comment -

            I completely agree that writing regexes against LaTeX is a poor solution; I don't feel great doing. Maybe it'd be fine if we impose rigid standards for writing the tex commands that are parsed for metadata.

            I see DM-10928 as the next step. Now that we have a working solution we can begin to experiment in DM-10928 with replacing custom parsing code implemented here with scraping the pandoc output for metadata.

            Show
            jsick Jonathan Sick added a comment - I completely agree that writing regexes against LaTeX is a poor solution; I don't feel great doing. Maybe it'd be fine if we impose rigid standards for writing the tex commands that are parsed for metadata. I see DM-10928 as the next step. Now that we have a working solution we can begin to experiment in DM-10928 with replacing custom parsing code implemented here with scraping the pandoc output for metadata.
            Hide
            jsick Jonathan Sick added a comment - - edited

            I'm not sure we have two tickets: one saying "improve the metadata extraction" should have been fine if that is the case.

            I think that your suggested approach in DM-10928, to pre-extract things like the title and abstract before passing it through pandoc, requires all of this functionality anyways.

            In any case, I don't think the statement that one ticket should been fine is called for.

            Show
            jsick Jonathan Sick added a comment - - edited I'm not sure we have two tickets: one saying "improve the metadata extraction" should have been fine if that is the case. I think that your suggested approach in DM-10928 , to pre-extract things like the title and abstract before passing it through pandoc, requires all of this functionality anyways. In any case, I don't think the statement that one ticket should been fine is called for.
            Hide
            jsick Jonathan Sick added a comment -

            Distributed in lander==0.1.5 and metasrc==0.1.3

            Show
            jsick Jonathan Sick added a comment - Distributed in lander==0.1.5 and metasrc==0.1.3

              People

              • Assignee:
                jsick Jonathan Sick
                Reporter:
                jsick Jonathan Sick
                Reviewers:
                Tim Jenness
                Watchers:
                Jonathan Sick, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel