# Add unit tests for ingestion

XMLWordPrintable

## Details

• Type: Story
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
• Story Points:
2
• Sprint:
AP S18-4
• Team:

## Description

As a self-contained module within ap_verify, ingestion should have unit tests of its functions. It may be possible to implement a unit test by specifically ingesting only 1-2 files of each type.

## Activity

Hide
Krzysztof Findeisen added a comment -

Hi Meredith Rawls, can you review this ticket? In particular, any tests I've missed would be appreciated.

Show
Krzysztof Findeisen added a comment - Hi Meredith Rawls , can you review this ticket? In particular, any tests I've missed would be appreciated.
Hide
Meredith Rawls added a comment -

This looks great. The only test I might add would open up the registries created after all the data products have been ingested and ensure the correct visit, ccdnum, filter, date, etc. have been saved for each one. You could compare it with the registry.sqlite3 and calibRegistry.sqlite3 that already exist in testdata_decam/rawData. I don't think this is essential, however, so I'll leave it up to you whether you want to add it or not.

Show
Meredith Rawls added a comment - This looks great. The only test I might add would open up the registries created after all the data products have been ingested and ensure the correct visit, ccdnum, filter, date, etc. have been saved for each one. You could compare it with the registry.sqlite3 and calibRegistry.sqlite3 that already exist in testdata_decam/rawData. I don't think this is essential, however, so I'll leave it up to you whether you want to add it or not.
Hide
Krzysztof Findeisen added a comment -

A test like that sounds like it would be sensitive to the details of how registries/repositories work. Can we do something equivalent through the Butler abstraction (which I guess is the same as asking if I should be checking any info that's not in the data IDs)?

Show
Krzysztof Findeisen added a comment - A test like that sounds like it would be sensitive to the details of how registries/repositories work. Can we do something equivalent through the Butler abstraction (which I guess is the same as asking if I should be checking any info that's not in the data IDs)?
Hide
Meredith Rawls added a comment -

That's true. At present, the registries do include info that is not in the dataId. The raw image registry has fields (id, hdu, instcal, wtmap, object, visit, taiObs, filter, ccdnum, dqmask, date, ccd, proposal, expTime) while the calib registry has fields (id, filter, validStart, path, calibDate, ccdnum, validEnd). Something occurs during ingestion to populate the registries with this information, and subsequent Butler calls can get data based on them. In practice, a common behind-the-scenes check that happens when processing begins is that the calibration product valid dates include the raw exposure's date.

So I guess ideally there would be a test to see if whatever additional header metadata is read during ingestion (e.g., date) lands in the registry appropriately (e.g., try retrieving something with the Butler based on date alone?). It wouldn't surprise me if this fails, though, because one also has to specify a bunch of arguably extraneous information as you've discovered.

Show
Meredith Rawls added a comment - That's true. At present, the registries do include info that is not in the dataId. The raw image registry has fields (id, hdu, instcal, wtmap, object, visit, taiObs, filter, ccdnum, dqmask, date, ccd, proposal, expTime) while the calib registry has fields (id, filter, validStart, path, calibDate, ccdnum, validEnd). Something occurs during ingestion to populate the registries with this information, and subsequent Butler calls can get data based on them. In practice, a common behind-the-scenes check that happens when processing begins is that the calibration product valid dates include the raw exposure's date. So I guess ideally there would be a test to see if whatever additional header metadata is read during ingestion (e.g., date) lands in the registry appropriately (e.g., try retrieving something with the Butler based on date alone?). It wouldn't surprise me if this fails, though, because one also has to specify a bunch of arguably extraneous information as you've discovered.
Hide
Krzysztof Findeisen added a comment - - edited

Hmm... most of those things aren't accessible through the Butler:

 >>> butler.getKeys('raw') {'date': , 'filter': , 'visit': ,  'hdu': , 'ccdnum': , 'object': } >>> butler.getKeys('cpFlat') {'calibDate': , 'filter': , 'ccdnum': } 

Show
Krzysztof Findeisen added a comment - - edited Hmm... most of those things aren't accessible through the Butler: >>> butler.getKeys('raw') {'date': <class 'str'>, 'filter': <class 'str'>, 'visit': <class 'int'>, 'hdu': <class 'int'>, 'ccdnum': <class 'int'>, 'object': <class 'str'>} >>> butler.getKeys('cpFlat') {'calibDate': <class 'str'>, 'filter': <class 'str'>, 'ccdnum': <class 'int'>}
Hide
Krzysztof Findeisen added a comment -

I ended up not adding a test, since it would probably fail due to DM-12672. I'll add a note to that ticket that the extra info you found would be a good regression test.

Show
Krzysztof Findeisen added a comment - I ended up not adding a test, since it would probably fail due to DM-12672 . I'll add a note to that ticket that the extra info you found would be a good regression test.

## People

• Assignee:
Krzysztof Findeisen
Reporter:
Krzysztof Findeisen
Reviewers:
Meredith Rawls
Watchers:
Krzysztof Findeisen, Meredith Rawls