Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-21840

Disallow NULL/None for implied dimension values in data IDs

    XMLWordPrintable

Details

    Description

      The Gen3 Registry's implied dimension dependencies (e.g. visit.physical_filter, exposure.visit) are allowed to be None, which translates nicely to NULL in SQL.  Unfortunately SQL NULLs and Python Nones compare differently, and we want the latter behavior in SQL queries.  That means we need to either

      • add special NULL-handling logic to the query-generation code
      • do our own mapping from Python None to sentinal values in all fetch/insert code
      • drop support for None/NULL in implied dimensions and make the associated foreign key columns in the database NOT NULL to support this.

      Attachments

        Issue Links

          Activity

            No builds found.
            jbosch Jim Bosch created issue -
            jbosch Jim Bosch made changes -
            Field Original Value New Value
            Epic Link DM-21254 [ 414685 ]
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-21832 [ DM-21832 ]
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-21836 [ DM-21836 ]
            jbosch Jim Bosch made changes -
            Labels gen3-middleware gen2-deprecation-blocker gen3-middleware
            yusra Yusra AlSayyad made changes -
            Epic Link DM-21254 [ 414685 ] DM-22586 [ 427653 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-22586 [ 427653 ] DM-23737 [ 431393 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-23737 [ 431393 ] DM-25268 [ 435627 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-25268 [ 435627 ] DM-26786 [ 439736 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-26786 [ 439736 ] DM-27956 [ 442730 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-27956 [ 442730 ] DM-29149 [ 458506 ]
            jbosch Jim Bosch added a comment -

            We've been working around this long enough (by making sure we have sentinal values for concepts like "this exposure has no physical_filter" and "this physical_filter has no band") that I think that's what we should keep doing, and reap the benefits of that by simplifying the butler code that (inconsistently) allowed implied dependencies to be None/NULL.  I'm going to give that a try on this ticket, though actually modifying the foreign key constraints to prohibit None/NULL will require a (trivial) migration, and I might punt that to another ticket.

             

            jbosch Jim Bosch added a comment - We've been working around this long enough (by making sure we have sentinal values for concepts like "this exposure has no physical_filter" and "this physical_filter has no band") that I think that's what we should keep doing, and reap the benefits of that by simplifying the butler code that (inconsistently) allowed implied dependencies to be None/NULL.  I'm going to give that a try on this ticket, though actually modifying the foreign key constraints to prohibit None/NULL will require a (trivial) migration, and I might punt that to another ticket.  
            jbosch Jim Bosch made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            jbosch Jim Bosch made changes -
            Status In Progress [ 3 ] To Do [ 10001 ]
            jbosch Jim Bosch added a comment -

            Not doing this right now after all: I was motivated to try it now because I thought it would remove the possibility that a DimensionRecord could be None in a data ID, and that would help out (very tangentially) with DM-29814.  But that's not true; one can also get a None DimensionRecord in at least one other way: by having a DimensionElement (e.g. visit_detector_region) that relates two other dimensions not exist for a particular combination.  It's possible we can reject that possibility, too, but that's more than I want to deal with on a "if I just did this, then the thing I care about is easier" project.

            jbosch Jim Bosch added a comment - Not doing this right now after all: I was motivated to try it now because I thought it would remove the possibility that a DimensionRecord could be None in a data ID, and that would help out (very tangentially) with DM-29814 .  But that's not true; one can also get a None DimensionRecord in at least one other way: by having a DimensionElement (e.g. visit_detector_region ) that relates two other dimensions not exist for a particular combination.  It's possible we can reject that possibility, too, but that's more than I want to deal with on a "if I just did this, then the thing I care about is easier" project.
            yusra Yusra AlSayyad made changes -
            Epic Link DM-29149 [ 458506 ] DM-30468 [ 509193 ]
            jbosch Jim Bosch made changes -
            Summary Handle NULL values for optional implied dependencies in query logic Disallow NULL/None for implied dimension values in data IDs
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30468 [ 509193 ] DM-30536 [ 511194 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30536 [ 511194 ] DM-30543 [ 511201 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30543 [ 511201 ] DM-30551 [ 511217 ]
            jbosch Jim Bosch added a comment -

            tjenness, I'm proposing this as a late addition to the upcoming schema change: I just want to make some columns in various dimension tables NOT NULL, via some changes to the Python code (and a corresponding version bump in the dimensions manager, not the dimensions config). The values in all databases should all already be non-NULL, so this is a trivial migration.

            Note that this is sort of the opposite of what the very stale branch for this ticket tried (and failed) to do, which was to ban None in various places in Python without enforcing it on the database side. My plan now is to drop that branch and replace it with a much simpler one that only enforces database-side constraints, which don't suffer from the problems the Python attempt ran into (in addition to being much more rigorously enforced).

            jbosch Jim Bosch added a comment - tjenness , I'm proposing this as a late addition to the upcoming schema change: I just want to make some columns in various dimension tables NOT NULL, via some changes to the Python code (and a corresponding version bump in the dimensions manager, not the dimensions config). The values in all databases should all already be non-NULL, so this is a trivial migration. Note that this is sort of the opposite of what the very stale branch for this ticket tried (and failed) to do, which was to ban None in various places in Python without enforcing it on the database side. My plan now is to drop that branch and replace it with a much simpler one that only enforces database-side constraints, which don't suffer from the problems the Python attempt ran into (in addition to being much more rigorously enforced).
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-33942 [ DM-33942 ]
            jbosch Jim Bosch made changes -
            Urgent? off
            Description The Gen3 Registry's implied dimension dependencies (e.g. visit.physical_filter, exposure.visit) are allowed to be None, which translates nicely to NULL in SQL.  Unfortunately SQL NULLs and Python Nones compare differently, and we want the latter behavior in SQL queries.  That means we need to either add special NULL-handling logic to the query-generation code, or do our own mapping from Python None to sentinal values in all fetch/insert code.  Right now it feels like the former is cleaner, but I'm open to opinions. The Gen3 Registry's implied dimension dependencies (e.g. visit.physical_filter, exposure.visit) are allowed to be None, which translates nicely to NULL in SQL.  Unfortunately SQL NULLs and Python Nones compare differently, and we want the latter behavior in SQL queries.  That means we need to either
            - add special NULL-handling logic to the query-generation code
            - do our own mapping from Python None to sentinal values in all fetch/insert code
            - drop support for None/NULL in implied dimensions and make the associated foreign key columns in the database NOT NULL to support this.
            Labels gen2-deprecation-blocker gen3-middleware gen3-middleware gen3-registry-incompatibility
            tjenness Tim Jenness added a comment -

            That seems fine to me. Do you want to make a patch directly to DM-33942 or make a patch here? It seems like if this is python-only then it's not overly critical where it sits.

            tjenness Tim Jenness added a comment - That seems fine to me. Do you want to make a patch directly to DM-33942 or make a patch here? It seems like if this is python-only then it's not overly critical where it sits.
            jbosch Jim Bosch added a comment -

            I've pushed a commit for this change to tickets/DM-21840 of daf_butler, which sits atop tickets/DM-33942 - I figure you'll just want to cherry-pick it. I'm running Jenkins now against that pair of branches, since this has the potential to break downstream unit tests by forcing them to be more careful about test data IDs (no more visits without physical filters).

            jbosch Jim Bosch added a comment - I've pushed a commit for this change to tickets/ DM-21840 of daf_butler, which sits atop tickets/ DM-33942 - I figure you'll just want to cherry-pick it. I'm running Jenkins now against that pair of branches, since this has the potential to break downstream unit tests by forcing them to be more careful about test data IDs (no more visits without physical filters).
            jbosch Jim Bosch made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            tjenness Tim Jenness added a comment -

            Okay. I can cherry pick when you tell me it's ready. It's likely that you will also need to include tickets/DM-34186 in your build since ingest tests rely on ObservationInfo having the new information available.

            tjenness Tim Jenness added a comment - Okay. I can cherry pick when you tell me it's ready. It's likely that you will also need to include tickets/ DM-34186 in your build since ingest tests rely on ObservationInfo having the new information available.
            jbosch Jim Bosch added a comment -

            This is turning up a lot of downstream breakage. My plan is to chase it all the way to the end, but then I'm going to split the daf_butler commit up into separate "fix tests" and "change schema" commits; I should then be able to merge the "fix tests" commit and all of the branches for other packages quickly, and you can cherry-pick the schema-change onto the DM-33942 and rebase it on top of the fixed tests. That should save you from having to juggle even more packages than you are already.

            jbosch Jim Bosch added a comment - This is turning up a lot of downstream breakage. My plan is to chase it all the way to the end, but then I'm going to split the daf_butler commit up into separate "fix tests" and "change schema" commits; I should then be able to merge the "fix tests" commit and all of the branches for other packages quickly, and you can cherry-pick the schema-change onto the DM-33942 and rebase it on top of the fixed tests. That should save you from having to juggle even more packages than you are already.
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-34616 [ DM-34616 ]
            jbosch Jim Bosch added a comment -

            I've put the "fix tests" changes on DM-34616, which does not sit atop the schema changes, and will get it out for review shortly. Once that's merged, we can rebase the schema change branch and cherry-pick what will then be just one new commit on the branch for this ticket.

            jbosch Jim Bosch added a comment - I've put the "fix tests" changes on DM-34616 , which does not sit atop the schema changes, and will get it out for review shortly. Once that's merged, we can rebase the schema change branch and cherry-pick what will then be just one new commit on the branch for this ticket.
            yusra Yusra AlSayyad made changes -
            Epic Link DM-30551 [ 511217 ] DM-32153 [ 779823 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32153 [ 779823 ] DM-32161 [ 779857 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32161 [ 779857 ] DM-32168 [ 779882 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32168 [ 779882 ] DM-32175 [ 779899 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32175 [ 779899 ] DM-32182 [ 779906 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32182 [ 779906 ] DM-32189 [ 779927 ]
            jbosch Jim Bosch made changes -
            Link This issue relates to DM-40818 [ DM-40818 ]
            yusra Yusra AlSayyad made changes -
            Epic Link DM-32189 [ 779927 ] DM-36891 [ 2451138 ]

            People

              jbosch Jim Bosch
              jbosch Jim Bosch
              Christopher Waters, Jim Bosch, John Parejko, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Jenkins

                  No builds found.