Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21840

Disallow NULL/None for implied dimension values in data IDs

    XMLWordPrintable

    Details

      Description

      The Gen3 Registry's implied dimension dependencies (e.g. visit.physical_filter, exposure.visit) are allowed to be None, which translates nicely to NULL in SQL.  Unfortunately SQL NULLs and Python Nones compare differently, and we want the latter behavior in SQL queries.  That means we need to either

      • add special NULL-handling logic to the query-generation code
      • do our own mapping from Python None to sentinal values in all fetch/insert code
      • drop support for None/NULL in implied dimensions and make the associated foreign key columns in the database NOT NULL to support this.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            We've been working around this long enough (by making sure we have sentinal values for concepts like "this exposure has no physical_filter" and "this physical_filter has no band") that I think that's what we should keep doing, and reap the benefits of that by simplifying the butler code that (inconsistently) allowed implied dependencies to be None/NULL.  I'm going to give that a try on this ticket, though actually modifying the foreign key constraints to prohibit None/NULL will require a (trivial) migration, and I might punt that to another ticket.

             

            Show
            jbosch Jim Bosch added a comment - We've been working around this long enough (by making sure we have sentinal values for concepts like "this exposure has no physical_filter" and "this physical_filter has no band") that I think that's what we should keep doing, and reap the benefits of that by simplifying the butler code that (inconsistently) allowed implied dependencies to be None/NULL.  I'm going to give that a try on this ticket, though actually modifying the foreign key constraints to prohibit None/NULL will require a (trivial) migration, and I might punt that to another ticket.  
            Hide
            jbosch Jim Bosch added a comment -

            Not doing this right now after all: I was motivated to try it now because I thought it would remove the possibility that a DimensionRecord could be None in a data ID, and that would help out (very tangentially) with DM-29814.  But that's not true; one can also get a None DimensionRecord in at least one other way: by having a DimensionElement (e.g. visit_detector_region) that relates two other dimensions not exist for a particular combination.  It's possible we can reject that possibility, too, but that's more than I want to deal with on a "if I just did this, then the thing I care about is easier" project.

            Show
            jbosch Jim Bosch added a comment - Not doing this right now after all: I was motivated to try it now because I thought it would remove the possibility that a DimensionRecord could be None in a data ID, and that would help out (very tangentially) with DM-29814 .  But that's not true; one can also get a None DimensionRecord in at least one other way: by having a DimensionElement (e.g. visit_detector_region ) that relates two other dimensions not exist for a particular combination.  It's possible we can reject that possibility, too, but that's more than I want to deal with on a "if I just did this, then the thing I care about is easier" project.
            Hide
            jbosch Jim Bosch added a comment -

            Tim Jenness, I'm proposing this as a late addition to the upcoming schema change: I just want to make some columns in various dimension tables NOT NULL, via some changes to the Python code (and a corresponding version bump in the dimensions manager, not the dimensions config). The values in all databases should all already be non-NULL, so this is a trivial migration.

            Note that this is sort of the opposite of what the very stale branch for this ticket tried (and failed) to do, which was to ban None in various places in Python without enforcing it on the database side. My plan now is to drop that branch and replace it with a much simpler one that only enforces database-side constraints, which don't suffer from the problems the Python attempt ran into (in addition to being much more rigorously enforced).

            Show
            jbosch Jim Bosch added a comment - Tim Jenness , I'm proposing this as a late addition to the upcoming schema change: I just want to make some columns in various dimension tables NOT NULL, via some changes to the Python code (and a corresponding version bump in the dimensions manager, not the dimensions config). The values in all databases should all already be non-NULL, so this is a trivial migration. Note that this is sort of the opposite of what the very stale branch for this ticket tried (and failed) to do, which was to ban None in various places in Python without enforcing it on the database side. My plan now is to drop that branch and replace it with a much simpler one that only enforces database-side constraints, which don't suffer from the problems the Python attempt ran into (in addition to being much more rigorously enforced).
            Hide
            tjenness Tim Jenness added a comment -

            That seems fine to me. Do you want to make a patch directly to DM-33942 or make a patch here? It seems like if this is python-only then it's not overly critical where it sits.

            Show
            tjenness Tim Jenness added a comment - That seems fine to me. Do you want to make a patch directly to DM-33942 or make a patch here? It seems like if this is python-only then it's not overly critical where it sits.
            Hide
            jbosch Jim Bosch added a comment -

            I've pushed a commit for this change to tickets/DM-21840 of daf_butler, which sits atop tickets/DM-33942 - I figure you'll just want to cherry-pick it. I'm running Jenkins now against that pair of branches, since this has the potential to break downstream unit tests by forcing them to be more careful about test data IDs (no more visits without physical filters).

            Show
            jbosch Jim Bosch added a comment - I've pushed a commit for this change to tickets/ DM-21840 of daf_butler, which sits atop tickets/ DM-33942 - I figure you'll just want to cherry-pick it. I'm running Jenkins now against that pair of branches, since this has the potential to break downstream unit tests by forcing them to be more careful about test data IDs (no more visits without physical filters).
            Hide
            tjenness Tim Jenness added a comment -

            Okay. I can cherry pick when you tell me it's ready. It's likely that you will also need to include tickets/DM-34186 in your build since ingest tests rely on ObservationInfo having the new information available.

            Show
            tjenness Tim Jenness added a comment - Okay. I can cherry pick when you tell me it's ready. It's likely that you will also need to include tickets/ DM-34186 in your build since ingest tests rely on ObservationInfo having the new information available.
            Hide
            jbosch Jim Bosch added a comment -

            This is turning up a lot of downstream breakage. My plan is to chase it all the way to the end, but then I'm going to split the daf_butler commit up into separate "fix tests" and "change schema" commits; I should then be able to merge the "fix tests" commit and all of the branches for other packages quickly, and you can cherry-pick the schema-change onto the DM-33942 and rebase it on top of the fixed tests. That should save you from having to juggle even more packages than you are already.

            Show
            jbosch Jim Bosch added a comment - This is turning up a lot of downstream breakage. My plan is to chase it all the way to the end, but then I'm going to split the daf_butler commit up into separate "fix tests" and "change schema" commits; I should then be able to merge the "fix tests" commit and all of the branches for other packages quickly, and you can cherry-pick the schema-change onto the DM-33942 and rebase it on top of the fixed tests. That should save you from having to juggle even more packages than you are already.
            Hide
            jbosch Jim Bosch added a comment -

            I've put the "fix tests" changes on DM-34616, which does not sit atop the schema changes, and will get it out for review shortly. Once that's merged, we can rebase the schema change branch and cherry-pick what will then be just one new commit on the branch for this ticket.

            Show
            jbosch Jim Bosch added a comment - I've put the "fix tests" changes on DM-34616 , which does not sit atop the schema changes, and will get it out for review shortly. Once that's merged, we can rebase the schema change branch and cherry-pick what will then be just one new commit on the branch for this ticket.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Watchers:
              Christopher Waters, Jim Bosch, John Parejko, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.