Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-22540

Include only real acronyms in acronyms.tex

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Team:
      Architecture

      Description

      The script generateAcronyms.py should be updated in order to include only acronyms in the acronyms table, when executed with the -t option (generate acronyms table)

        Attachments

          Issue Links

            Activity

            Hide
            gcomoretto Gabriele Comoretto added a comment -

            PR available here:

             

            https://github.com/lsst/lsst-texmf/pull/210

             

            Thanks to Brian Van Klaveren for helping to improve the implementation.

            Show
            gcomoretto Gabriele Comoretto added a comment - PR available here:   https://github.com/lsst/lsst-texmf/pull/210   Thanks to Brian Van Klaveren for helping to improve the implementation.
            Hide
            swinbank John Swinbank added a comment -

            Per my comment on GitHub — I have no argument with the code here, but I don't think this functionality is actually meaningful.

            Show
            swinbank John Swinbank added a comment - Per my comment on GitHub — I have no argument with the code here, but I don't think this functionality is actually meaningful.
            Hide
            gcomoretto Gabriele Comoretto added a comment -

            I disagree, I am producing a large number of test documents that have the acronyms table.

            From my understanding talking to Wil O'Mullane this is better than having the glossary.

             

            Show
            gcomoretto Gabriele Comoretto added a comment - I disagree, I am producing a large number of test documents that have the acronyms table. From my understanding talking to  Wil O'Mullane this is better than having the glossary.  
            Hide
            swinbank John Swinbank added a comment -

            Well, I guess it depends what you're asking for here.

            The ticket asks that only acronyms should be included in the acronyms table. As written, the code does not achieve that goal. Further, I can't see any way to achieve that goal given the approach that you have taken.

            If you have chosen to recast the ticket as “include only items in block capitals in the acronyms table”, then this code seems fine. I don't think that's a useful thing to do, but if you assure me that's what you & Wil O'Mullane want, then I won't block you from merging it. (Although I would appreciate an explanation as to why you think it's useful, because it sure beats me!)

            I would request that you add a comment explaining what the regexp is intended to achieve (in particular, the thinking behind the lookahead assertion isn't obvious to me).

            Show
            swinbank John Swinbank added a comment - Well, I guess it depends what you're asking for here. The ticket asks that only acronyms should be included in the acronyms table. As written, the code does not achieve that goal. Further, I can't see any way to achieve that goal given the approach that you have taken. If you have chosen to recast the ticket as “include only items in block capitals in the acronyms table”, then this code seems fine. I don't think that's a useful thing to do, but if you assure me that's what you & Wil O'Mullane want, then I won't block you from merging it. (Although I would appreciate an explanation as to why you think it's useful, because it sure beats me!) I would request that you add a comment explaining what the regexp is intended to achieve (in particular, the thinking behind the lookahead assertion isn't obvious to me).
            Hide
            gcomoretto Gabriele Comoretto added a comment -

            I agree that to be coherent, we shall define in the glossarydefs.csv which entry is an acronym and which is a glossary definition.

            The unused fields in the CSV file Documentation Tags could be used for that purpose, or a new field could be added. The number of entries to characterize should not be a big deal, I can do it. Maintaining the information should also not be a problem since each change needs to be reviewed. Of course, the logic in generateAcronyms.py would need a few bigger changes.

             

            Producing automatically a large number of test documents including glossary definitions looks repetitive, unnecessary and just makes the documents bigger. Definitions are already available in the applicable documents.

            Having a compact list of abbreviations fits better the scope of the test specifications and test plans and reports.

             

            Maybe, as you suggest, I should update the issues, asking to include instead, an Abbreviations table at the end of the document, where abbreviation are strings defined as follows:

            • contain only upper case characters, numbers and "/"s
            • starting with an upper case character or a number
            • can't be composed only by numbers

             

            Note also that, as per https://github.com/lsst/lsst-texmf/blob/master/bin/generateAcronyms.py#L352, I am just reusing (and improving, in master uses only upper cases characters) the definition of acronym-like strings, made originally in the script. If this is not a valid definition anymore, the script should be reviewed.

            Show
            gcomoretto Gabriele Comoretto added a comment - I agree that to be coherent, we shall define in the glossarydefs.csv which entry is an acronym and which is a glossary definition. The unused fields in the CSV file  Documentation Tags  could be used for that purpose, or a new field could be added. The number of entries to characterize should not be a big deal, I can do it. Maintaining the information should also not be a problem since each change needs to be reviewed. Of course, the logic in generateAcronyms.py would need a few bigger changes.   Producing automatically a large number of test documents including glossary definitions looks repetitive, unnecessary and just makes the documents bigger. Definitions are already available in the applicable documents. Having a compact list of abbreviations fits better the scope of the test specifications and test plans and reports.   Maybe, as you suggest, I should update the issues, asking to include instead, an Abbreviations table at the end of the document, where abbreviation are strings defined as follows: contain only upper case characters, numbers and "/"s starting with an upper case character or a number can't be composed only by numbers   Note also that, as per  https://github.com/lsst/lsst-texmf/blob/master/bin/generateAcronyms.py#L352 , I am just reusing (and improving, in master uses only upper cases characters) the definition of acronym-like strings, made originally in the script. If this is not a valid definition anymore, the script should be reviewed.
            Hide
            tjenness Tim Jenness added a comment -

            I thought master looks for upper case characters that match the regex and ALSO forms a special regex based on the supplied acronyms that are not just made of upper case characters.

            Show
            tjenness Tim Jenness added a comment - I thought master looks for upper case characters that match the regex and ALSO forms a special regex based on the supplied acronyms that are not just made of upper case characters.
            Hide
            gcomoretto Gabriele Comoretto added a comment -

            https://github.com/lsst/lsst-texmf/blob/master/bin/generateAcronyms.py#L42

             

            CAP_ACRONYM = re.compile(r"\b[A-Z][A-Z]+\b")

             

            Just 2 or more upper case characters.

            Show
            gcomoretto Gabriele Comoretto added a comment - https://github.com/lsst/lsst-texmf/blob/master/bin/generateAcronyms.py#L42   CAP_ACRONYM = re.compile(r"\b [A-Z] [A-Z] +\b")   Just 2 or more upper case characters.
            Show
            tjenness Tim Jenness added a comment - Yes, but I'm talking about https://github.com/lsst/lsst-texmf/blob/master/bin/generateAcronyms.py#L344
            Hide
            gcomoretto Gabriele Comoretto added a comment -

            That's the list of defined entries in glossarydefs.csv+myacronyms-skipacronyms.It is used to find all the matches between the glossary definitions and what is used.

            The regular expression I am referring to, is used on line 352 to find all potential acronyms-like strings (as per line 349) in the text. And it is looking on only full uppercase strings.

            From the difference with the matches, the missing ones are found.

            Show
            gcomoretto Gabriele Comoretto added a comment - That's the list of defined entries in glossarydefs.csv+myacronyms-skipacronyms.It is used to find all the matches between the glossary definitions and what is used. The regular expression I am referring to, is used on line 352 to find all potential acronyms-like strings  (as per line 349) in the text. And it is looking on only full uppercase strings. From the difference with the matches, the missing ones are found.
            Hide
            womullan Wil O'Mullane added a comment -

            I have checked this for both glossary and acronyms and it looks good. One small comment on github was fixed. There is still one comment from Tim on the regexp.. it could perhaps be simplified but I would not delay much on merging this. It is an improvement and it is good to flag acronyms separately to glossary entries. 

            Show
            womullan Wil O'Mullane added a comment - I have checked this for both glossary and acronyms and it looks good. One small comment on github was fixed. There is still one comment from Tim on the regexp.. it could perhaps be simplified but I would not delay much on merging this. It is an improvement and it is good to flag acronyms separately to glossary entries. 

              People

              • Assignee:
                gcomoretto Gabriele Comoretto
                Reporter:
                gcomoretto Gabriele Comoretto
                Reviewers:
                Wil O'Mullane
                Watchers:
                Gabriele Comoretto, John Swinbank, Kian-Tat Lim, Tim Jenness, Wil O'Mullane
              • Votes:
                0 Vote for this issue
                Watchers:
                5 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel