Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-16160

Fix constraint check in associate method

    XMLWordPrintable

    Details

      Description

      My first attempt to implement constraint check in associate() method (DM-15686) apparently broke everything (DM-16159). Looks like code in obs_subaru depends on uniqueness check to be done in addDataset() method. My commit moved that check to associate() which is apparently too late, tha fix for that it to leave the test in addDataset(). IT would be interesting to also understand what is going on in obs_subaru because in the future we'll probably switch to table-level constraints and that can break things again.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment - - edited

            The error in obs_subaru test shows like this:

            ERROR: testOnConflictIgnore (__main__.HscIngestTestCase)
            ----------------------------------------------------------------------
            Traceback (most recent call last):
              File "tests/test_ingest.py", line 120, in testOnConflictIgnore
                self.runIngest()   # this ong should silently fail
              File "tests/test_ingest.py", line 78, in runIngest
                task.run(files)
              File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_base/16.0-14-g71e547a+4/python/lsst/obs/base/gen3/ingest.py", line 170, in run
                self.processFile(os.path.abspath(file))
              File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_base/16.0-14-g71e547a+4/python/lsst/obs/base/gen3/ingest.py", line 330, in processFile
                raise
              File "/software/lsstsw/stack3_20171023/python/miniconda3-4.3.21/lib/python3.6/contextlib.py", line 88, in __exit__
                next(self.gen)
              File "/project/salnikov/DM-15686/daf_butler/python/lsst/daf/butler/butler.py", line 199, in transaction
                yield
              File "/software/lsstsw/stack3_20171023/python/miniconda3-4.3.21/lib/python3.6/contextlib.py", line 88, in __exit__
                next(self.gen)
              File "/project/salnikov/DM-15686/daf_butler/python/lsst/daf/butler/registries/sqlRegistry.py", line 99, in transaction
                trans.commit()
              File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/sqlalchemy/1.2.2+2/lib/python/SQLAlchemy-1.2.2-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1642, in commit
                raise exc.InvalidRequestError("This transaction is inactive")
            sqlalchemy.exc.InvalidRequestError: This transaction is inactive
            

            which looks like transaction commit after transaction abort. IT could be because of our transaction nesting in registry - addDataset() is @transactional and in turn calls associate() which is @transactional too. I need to understand how that nesting works in different scenarios. And looking at the implementation of @transactional it may be enough in this case to pass transactional=False when calling associate() from addDataset().

            Show
            salnikov Andy Salnikov added a comment - - edited The error in obs_subaru test shows like this: ERROR: testOnConflictIgnore (__main__.HscIngestTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/test_ingest.py", line 120, in testOnConflictIgnore self.runIngest() # this ong should silently fail File "tests/test_ingest.py", line 78, in runIngest task.run(files) File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_base/16.0-14-g71e547a+4/python/lsst/obs/base/gen3/ingest.py", line 170, in run self.processFile(os.path.abspath(file)) File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/obs_base/16.0-14-g71e547a+4/python/lsst/obs/base/gen3/ingest.py", line 330, in processFile raise File "/software/lsstsw/stack3_20171023/python/miniconda3-4.3.21/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/project/salnikov/DM-15686/daf_butler/python/lsst/daf/butler/butler.py", line 199, in transaction yield File "/software/lsstsw/stack3_20171023/python/miniconda3-4.3.21/lib/python3.6/contextlib.py", line 88, in __exit__ next(self.gen) File "/project/salnikov/DM-15686/daf_butler/python/lsst/daf/butler/registries/sqlRegistry.py", line 99, in transaction trans.commit() File "/software/lsstsw/stack3_20171023/stack/miniconda3-4.3.21-10a4fa6/Linux64/sqlalchemy/1.2.2+2/lib/python/SQLAlchemy-1.2.2-py3.6-linux-x86_64.egg/sqlalchemy/engine/base.py", line 1642, in commit raise exc.InvalidRequestError("This transaction is inactive") sqlalchemy.exc.InvalidRequestError: This transaction is inactive which looks like transaction commit after transaction abort. IT could be because of our transaction nesting in registry - addDataset() is @transactional and in turn calls associate() which is @transactional too. I need to understand how that nesting works in different scenarios. And looking at the implementation of @transactional it may be enough in this case to pass transactional=False when calling associate() from addDataset() .
            Hide
            jbosch Jim Bosch added a comment -

            I had almost exactly the same reasoning, and even tried just passing transactional=False when calling associate.  That was not enough on its own (it caused a different problem, because it caused the addDataset to be committed even if the association was not).  It may work, however, to both do that and move the integrity check before addDataset.

             

            Show
            jbosch Jim Bosch added a comment - I had almost exactly the same reasoning, and even tried just passing transactional=False when calling associate .  That was not enough on its own (it caused a different problem, because it caused the addDataset to be committed even if the association was not).  It may work, however, to both do that and move the integrity check before addDataset .  
            Hide
            salnikov Andy Salnikov added a comment -

            After adding transactional=False to associate call the exception above disappeared, but there is now a different failure in the same test:

            FAIL: testOnConflictIgnore (__main__.HscIngestTestCase)
            ----------------------------------------------------------------------
            Traceback (most recent call last):
              File "tests/test_ingest.py", line 122, in testOnConflictIgnore
                self.assertEqual(n1, n2)
            AssertionError: {'COUNT(*)': 12} != {'COUNT(*)': 13}
            - {'COUNT(*)': 12}
            ?               ^
             
            + {'COUNT(*)': 13}
            ?               ^
            

            Looks like part of a transaction had succeeded, need to debug that, I guess.

            Show
            salnikov Andy Salnikov added a comment - After adding transactional=False to associate call the exception above disappeared, but there is now a different failure in the same test: FAIL: testOnConflictIgnore (__main__.HscIngestTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "tests/test_ingest.py", line 122, in testOnConflictIgnore self.assertEqual(n1, n2) AssertionError: {'COUNT(*)': 12} != {'COUNT(*)': 13} - {'COUNT(*)': 12} ? ^   + {'COUNT(*)': 13} ? ^ Looks like part of a transaction had succeeded, need to debug that, I guess.
            Hide
            salnikov Andy Salnikov added a comment -

            OK, I guess I see the same error as Jim now. I'm still curious to understand what is happening. my idea is that with transactional=False the transaction should be aborted cleanly by the addDataset() wrapper. Will spend some time to figure out how it works.

            Show
            salnikov Andy Salnikov added a comment - OK, I guess I see the same error as Jim now. I'm still curious to understand what is happening. my idea is that with transactional=False the transaction should be aborted cleanly by the addDataset() wrapper. Will spend some time to figure out how it works.
            Hide
            jbosch Jim Bosch added a comment -

            I think the logic is trying to essentially ignore conflicts and proceed with ingesting further datasets.  The problem is that we are doing this by catching an exception raised by SQLAlchemy in the case of a conflict, but when SQLAlchemy raises it's already started the rollback.

            Show
            jbosch Jim Bosch added a comment - I think the logic is trying to essentially ignore conflicts and proceed with ingesting further datasets.  The problem is that we are doing this by catching an exception raised by SQLAlchemy in the case of a conflict, but when SQLAlchemy raises it's already started the rollback.
            Hide
            salnikov Andy Salnikov added a comment -

            There is something very disturbing that I see in obs.base.gen3.ingest.RawIngestTask.ingestFile()

                        # We use transactional=False here (a kwarg added by the
                        # @transactional decorator) to keep the conflict exception from
                        # starting a higher-level rollback - if we catch this exception,
                        # we don't want to have already started rolling back the ingest of
                        # *previous* files when config.onError=='rollback' but
                        # config.confict=='ignore'.
                        ref = self.butler.registry.addDataset(self.datasetType, dataId, run=run,
                                                              transactional=False, recursive=True)
            

            (I think this is why Jim says it's hard to fix without redesigning the core stuff). I guess assumption here is that when exception happens in addDataset() then registry remains unmodified. I think it's hard/impossible to implement such a guarantee in general. I think RawIngestTask approach to its own exception handling needs to be changed to rely more on registry's transaction management rather than trying to mange it on its own. But I have very little idea what the requirements and expected failure options here are. And maybe RawIngestTask behavior is actually OK, but some assumptions can be relaxed and unit test can be modified accordingly.

            I'll add pre-insert test back to addDataset() as a workaround, this should fix our current problem, but in general transaction management will need some work.

            Show
            salnikov Andy Salnikov added a comment - There is something very disturbing that I see in obs.base.gen3.ingest.RawIngestTask.ingestFile() # We use transactional=False here (a kwarg added by the # @transactional decorator) to keep the conflict exception from # starting a higher-level rollback - if we catch this exception, # we don't want to have already started rolling back the ingest of # *previous* files when config.onError=='rollback' but # config.confict=='ignore'. ref = self .butler.registry.addDataset( self .datasetType, dataId, run = run, transactional = False , recursive = True ) (I think this is why Jim says it's hard to fix without redesigning the core stuff). I guess assumption here is that when exception happens in addDataset() then registry remains unmodified. I think it's hard/impossible to implement such a guarantee in general. I think RawIngestTask approach to its own exception handling needs to be changed to rely more on registry's transaction management rather than trying to mange it on its own. But I have very little idea what the requirements and expected failure options here are. And maybe RawIngestTask behavior is actually OK, but some assumptions can be relaxed and unit test can be modified accordingly. I'll add pre-insert test back to addDataset() as a workaround, this should fix our current problem, but in general transaction management will need some work.
            Hide
            jbosch Jim Bosch added a comment -

            I agree 100% with all of that (both diagnosis and workaround).  I suspect we will need to start using explicit ON CONFLICT instructions or something, or perhaps scale back what we think we can promise in terms of guarantees in ingest.

            Show
            jbosch Jim Bosch added a comment - I agree 100% with all of that (both diagnosis and workaround).  I suspect we will need to start using explicit ON CONFLICT instructions or something, or perhaps scale back what we think we can promise in terms of guarantees in ingest.
            Hide
            salnikov Andy Salnikov added a comment -

            Replying to Jim's comment above - in this case SQLAlchemy does not start a rollback, RawIngestTask does its own transaction management and it asks registry not to do any transaction management by passing transactional=False to addDataset. RawIngestTask catches ValueError though (may not be the best exception type in this case as ValueError could happen for some other reasons) and tries to do smatr things then but assumption (in unit test at least) is that registry is not modified in that case.
            If we move to checks based on database constraints the situation may become more complicated. Some checks/failures could only happen when transaction is committed, meaning that addDataset() won't be able to detect anything or raise its exception, and RawIngestTask will need to do something when it tries to commit its "large" transaction. In that respect I think having one large transaction is even more problematic, it easier to handle shorter transaction which could fail than one huge transaction.

            Show
            salnikov Andy Salnikov added a comment - Replying to Jim's comment above - in this case SQLAlchemy does not start a rollback, RawIngestTask does its own transaction management and it asks registry not to do any transaction management by passing transactional=False to addDataset . RawIngestTask catches ValueError though (may not be the best exception type in this case as ValueError could happen for some other reasons) and tries to do smatr things then but assumption (in unit test at least) is that registry is not modified in that case. If we move to checks based on database constraints the situation may become more complicated. Some checks/failures could only happen when transaction is committed, meaning that addDataset() won't be able to detect anything or raise its exception, and RawIngestTask will need to do something when it tries to commit its "large" transaction. In that respect I think having one large transaction is even more problematic, it easier to handle shorter transaction which could fail than one huge transaction.
            Hide
            salnikov Andy Salnikov added a comment -

            Next attempt is here, this time with Jenkins (centos builds finished long time ago, macox is still in progress, but obs_subaru has passed)! There are two commits on this PR, one is identical to my yesterday's and second applies two small fixes on top of that.

            And I am terribly sorry for yesterday screw up!

            Show
            salnikov Andy Salnikov added a comment - Next attempt is here, this time with Jenkins (centos builds finished long time ago, macox is still in progress, but obs_subaru has passed)! There are two commits on this PR, one is identical to my yesterday's and second applies two small fixes on top of that. And I am terribly sorry for yesterday screw up!
            Hide
            jbosch Jim Bosch added a comment - - edited

            Looks good.  Better fix is DM-16221.

            Show
            jbosch Jim Bosch added a comment - - edited Looks good.  Better fix is DM-16221 .
            Hide
            salnikov Andy Salnikov added a comment -

            Thanks for review! Merged and done.

            Show
            salnikov Andy Salnikov added a comment - Thanks for review! Merged and done.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Jim Bosch, Vaikunth Thukral
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  CI Builds

                  No builds found.