Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-13524

Add unit tests for ingestion

    Details

      Description

      As a self-contained module within ap_verify, ingestion should have unit tests of its functions. It may be possible to implement a unit test by specifically ingesting only 1-2 files of each type.

        Attachments

          Issue Links

            Activity

            Hide
            krzys Krzysztof Findeisen added a comment -

            Hi Meredith Rawls, can you review this ticket? In particular, any tests I've missed would be appreciated.

            Show
            krzys Krzysztof Findeisen added a comment - Hi Meredith Rawls , can you review this ticket? In particular, any tests I've missed would be appreciated.
            Hide
            mrawls Meredith Rawls added a comment -

            This looks great. The only test I might add would open up the registries created after all the data products have been ingested and ensure the correct visit, ccdnum, filter, date, etc. have been saved for each one. You could compare it with the `registry.sqlite3` and `calibRegistry.sqlite3` that already exist in `testdata_decam/rawData`. I don't think this is essential, however, so I'll leave it up to you whether you want to add it or not.

            Show
            mrawls Meredith Rawls added a comment - This looks great. The only test I might add would open up the registries created after all the data products have been ingested and ensure the correct visit, ccdnum, filter, date, etc. have been saved for each one. You could compare it with the `registry.sqlite3` and `calibRegistry.sqlite3` that already exist in `testdata_decam/rawData`. I don't think this is essential, however, so I'll leave it up to you whether you want to add it or not.
            Hide
            krzys Krzysztof Findeisen added a comment -

            A test like that sounds like it would be sensitive to the details of how registries/repositories work. Can we do something equivalent through the Butler abstraction (which I guess is the same as asking if I should be checking any info that's not in the data IDs)?

            Show
            krzys Krzysztof Findeisen added a comment - A test like that sounds like it would be sensitive to the details of how registries/repositories work. Can we do something equivalent through the Butler abstraction (which I guess is the same as asking if I should be checking any info that's not in the data IDs)?
            Hide
            mrawls Meredith Rawls added a comment -

            That's true. At present, the registries do include info that is not in the dataId. The raw image registry has fields (id, hdu, instcal, wtmap, object, visit, taiObs, filter, ccdnum, dqmask, date, ccd, proposal, expTime) while the calib registry has fields (id, filter, validStart, path, calibDate, ccdnum, validEnd). Something occurs during ingestion to populate the registries with this information, and subsequent Butler calls can get data based on them. In practice, a common behind-the-scenes check that happens when processing begins is that the calibration product valid dates include the raw exposure's date.

            So I guess ideally there would be a test to see if whatever additional header metadata is read during ingestion (e.g., date) lands in the registry appropriately (e.g., try retrieving something with the Butler based on date alone?). It wouldn't surprise me if this fails, though, because one also has to specify a bunch of arguably extraneous information as you've discovered.

            Show
            mrawls Meredith Rawls added a comment - That's true. At present, the registries do include info that is not in the dataId. The raw image registry has fields (id, hdu, instcal, wtmap, object, visit, taiObs, filter, ccdnum, dqmask, date, ccd, proposal, expTime) while the calib registry has fields (id, filter, validStart, path, calibDate, ccdnum, validEnd). Something occurs during ingestion to populate the registries with this information, and subsequent Butler calls can get data based on them. In practice, a common behind-the-scenes check that happens when processing begins is that the calibration product valid dates include the raw exposure's date. So I guess ideally there would be a test to see if whatever additional header metadata is read during ingestion (e.g., date) lands in the registry appropriately (e.g., try retrieving something with the Butler based on date alone?). It wouldn't surprise me if this fails, though, because one also has to specify a bunch of arguably extraneous information as you've discovered.
            Hide
            krzys Krzysztof Findeisen added a comment - - edited

            Hmm... most of those things aren't accessible through the Butler:

            >>> butler.getKeys('raw')
            {'date': <class 'str'>, 'filter': <class 'str'>, 'visit': <class 'int'>,
                'hdu': <class 'int'>, 'ccdnum': <class 'int'>, 'object': <class 'str'>}
            >>> butler.getKeys('cpFlat')
            {'calibDate': <class 'str'>, 'filter': <class 'str'>, 'ccdnum': <class 'int'>}
            

            Show
            krzys Krzysztof Findeisen added a comment - - edited Hmm... most of those things aren't accessible through the Butler: >>> butler.getKeys('raw') {'date': <class 'str'>, 'filter': <class 'str'>, 'visit': <class 'int'>, 'hdu': <class 'int'>, 'ccdnum': <class 'int'>, 'object': <class 'str'>} >>> butler.getKeys('cpFlat') {'calibDate': <class 'str'>, 'filter': <class 'str'>, 'ccdnum': <class 'int'>}
            Hide
            krzys Krzysztof Findeisen added a comment -

            I ended up not adding a test, since it would probably fail due to DM-12672. I'll add a note to that ticket that the extra info you found would be a good regression test.

            Show
            krzys Krzysztof Findeisen added a comment - I ended up not adding a test, since it would probably fail due to DM-12672 . I'll add a note to that ticket that the extra info you found would be a good regression test.

              People

              • Assignee:
                krzys Krzysztof Findeisen
                Reporter:
                krzys Krzysztof Findeisen
                Reviewers:
                Meredith Rawls
                Watchers:
                Krzysztof Findeisen, Meredith Rawls
              • Votes:
                0 Vote for this issue
                Watchers:
                2 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel