Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: daf_butler, obs_base
-
Story Points:4
-
Team:Architecture
-
Urgent?:No
Description
Robert Lupton has requested that the first sequence number in a visit be added to the visit dimension record definition in the gen3 registry schema. We also wish to store the group_id to allow grouping of visits.
In the general case this will allow a user to specify a dataId for the raw that can also be used to obtain the calexp so long as they are specifying the first raw and not the second raw.
day_obs is already in the visit definition.
There are some caveats:
- If a registry has multiple visit_system definitions the day_obs+seq_num may not be able to uniquely identify a calexp.
- Visit definitions use the exposure dimension record (they do not look at files) so determining the first sequence in the visit depends on all the visit members being ingested.
To make this robust I think ObservationInfo is going to have to be modified to have the first_seq and end_seq concepts from CAP-763 – this would require that we also change the exposure dimension record to include first_seq and end_seq since without the former we can't define the visit and without the latter the tooling can't warn about possible missing exposures in the visit definition.
It may be worth noting that we do already have other visit fields that require us to have at least one detector from each snap ingested prior to constructing the visit - visit.timespan and visit.exposure_time come to mind. I don't dispute that they are problematic, too, but unfortunately they are also harder to fix with metadata additions.
So, either we don't need to consider this ticket blocked by metadata additions (because we already require at least one detector from each snap to be ingested), or we should think about working out a bigger schema change that resolves the full problem (maybe we could just drop those temporal fields from visit and nobody would care, especially, if queries automatically join in exposure for temporal information). Another alternative is running UPDATE queries on the visit table once all of the information has landed, which is conceptually yucky but may not be a huge problem in practice.