Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: ctrl_mpexec
-
Story Points:1
-
Epic Link:
-
Sprint:DB_F20_06
-
Team:Data Access and Database
-
Urgent?:No
Description
w_35 of the software stack + ci_hsc_gen3 dies in pipeline.sh with
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
|
Monika Adamow first generated the error and I was able to duplicate it. While a full run has not yet completed using -j 1, it was run far enough to get through the first few PipelineTasks without having this error.
No error messages from previous steps in scons. Full output of pipeline.sh:
|
bin/pipeline.sh 8 /scratch/mgower/weekly_ci_hsc_gen3/postgres/git/ci_hsc_gen3/DATA
|
ctrl.mpexec.cmdLineFwk INFO: QuantumGraph contains 155 quanta for 12 tasks
|
ctrl.mpexec.cmdLineFwk INFO: QuantumGraph contains 155 quanta for 12 tasks
|
Process task-0:
|
Traceback (most recent call last):
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
|
cursor, statement, parameters, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
|
cursor.execute(statement, parameters)
|
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
|
|
|
The above exception was the direct cause of the following exception:
|
Traceback (most recent call last):
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
|
self.run()
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
|
self._target(*self._args, **self._kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-13-g37abb6e+24c9b31241/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 83, in execute
|
self.initGlobals(quantum, butler)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-13-g37abb6e+24c9b31241/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 273, in initGlobals
|
Instrument.fromName(instrument, butler.registry)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/obs_base/20.0.0-37-g38a3e24+ce95e46117/python/lsst/obs/base/_instrument.py", line 219, in fromName
|
records = list(registry.queryDimensionRecords("instrument", instrument=name))
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 1402, in queryDimensionRecords
|
where=where, components=components, **kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 1324, in queryDataIds
|
standardizedDataId = self.expandDataId(dataId, **kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 851, in expandDataId
|
fetched = tuple(storage.fetch(dataIdSet))
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/dimensions/caching.py", line 111, in fetch
|
for record in self._nested.fetch(toFetch):
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/dimensions/table.py", line 127, in fetch
|
for row in self._db.query(query.combine()):
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/interfaces/_database.py", line 1373, in query
|
return self._connection.execute(sql, *args, **kwds)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
|
return meth(self, multiparams, params)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
|
return connection._execute_clauseelement(self, multiparams, params)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
|
distilled_params,
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
|
e, statement, parameters, cursor, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
|
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
|
raise exception
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
|
cursor, statement, parameters, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
|
cursor.execute(statement, parameters)
|
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL error: decryption failed or bad record mac
|
|
[SQL: SELECT svclsstdbmgower1.instrument.name, svclsstdbmgower1.instrument.visit_max, svclsstdbmgower1.instrument.exposure_max, svclsstdbmgower1.instrument.detector_max, svclsstdbmgower1.instrument.class_name
|
FROM svclsstdbmgower1.instrument
|
WHERE svclsstdbmgower1.instrument.name = %(name_1)s]
|
[parameters: {'name_1': 'HSC'}]
|
(Background on this error at: http://sqlalche.me/e/e3q8)
|
Process task-1:
|
Traceback (most recent call last):
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
|
cursor, statement, parameters, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
|
cursor.execute(statement, parameters)
|
psycopg2.OperationalError: SSL error: decryption failed or bad record mac
|
|
|
The above exception was the direct cause of the following exception:
|
|
Traceback (most recent call last):
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
|
self.run()
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/multiprocessing/process.py", line 99, in run
|
self._target(*self._args, **self._kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-13-g37abb6e+24c9b31241/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 83, in execute
|
self.initGlobals(quantum, butler)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/ctrl_mpexec/20.0.0-13-g37abb6e+24c9b31241/python/lsst/ctrl/mpexec/singleQuantumExecutor.py", line 273, in initGlobals
|
Instrument.fromName(instrument, butler.registry)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/obs_base/20.0.0-37-g38a3e24+ce95e46117/python/lsst/obs/base/_instrument.py", line 219, in fromName
|
records = list(registry.queryDimensionRecords("instrument", instrument=name))
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 1402, in queryDimensionRecords
|
where=where, components=components, **kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 1324, in queryDataIds
|
standardizedDataId = self.expandDataId(dataId, **kwargs)
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/_registry.py", line 851, in expandDataId
|
fetched = tuple(storage.fetch(dataIdSet))
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/dimensions/caching.py", line 111, in fetch
|
for record in self._nested.fetch(toFetch):
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/dimensions/table.py", line 127, in fetch
|
for row in self._db.query(query.combine()):
|
File "/software/lsstsw/stack_20200515/stack/miniconda3-4.7.12-46b24e8/Linux64/daf_butler/19.0.0-148-g38150800+de92faa26f/python/lsst/daf/butler/registry/interfaces/_database.py", line 1373, in query
|
return self._connection.execute(sql, *args, **kwds)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 984, in execute
|
return meth(self, multiparams, params)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/sql/elements.py", line 293, in _execute_on_connection
|
return connection._execute_clauseelement(self, multiparams, params)
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1103, in _execute_clauseelement
|
distilled_params,
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1288, in _execute_context
|
e, statement, parameters, cursor, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1482, in _handle_dbapi_exception
|
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/util/compat.py", line 178, in raise_
|
raise exception
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/base.py", line 1248, in _execute_context
|
cursor, statement, parameters, context
|
File "/software/lsstsw/stack_20200515/python/miniconda3-4.7.12/envs/lsst-scipipe/lib/python3.7/site-packages/sqlalchemy/engine/default.py", line 590, in do_execute
|
cursor.execute(statement, parameters)
|
sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) SSL error: decryption failed or bad record mac
|
The error seems to repeat multiple times and then we get lines like:
trl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MakeWarpTask, label=makeWarpTask) dataId={abstract_filter: r, instrument: HSC, skymap: discrete/ci_hsc, physical_filter: HSC-R, tract: 0, visit_system: 0, patch: 69, visit: 903342}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MakeWarpTask, label=makeWarpTask) dataId={abstract_filter: i, instrument: HSC, skymap: discrete/ci_hsc, physical_filter: HSC-I, tract: 0, visit_system: 0, patch: 69, visit: 903986}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MakeWarpTask, label=makeWarpTask) dataId={abstract_filter: r, instrument: HSC, skymap: discrete/ci_hsc, physical_filter: HSC-R, tract: 0, visit_system: 0, patch: 69, visit: 903344}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(CompareWarpAssembleCoaddTask, label=assembleCoadd) dataId={abstract_filter: r, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(CompareWarpAssembleCoaddTask, label=assembleCoadd) dataId={abstract_filter: i, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(DetectCoaddSourcesTask, label=detection) dataId={abstract_filter: r, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(DetectCoaddSourcesTask, label=detection) dataId={abstract_filter: i, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MergeDetectionsTask, label=mergeDetections) dataId={skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(DeblendCoaddSourcesSingleTask, label=deblend) dataId={abstract_filter: r, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(DeblendCoaddSourcesSingleTask, label=deblend) dataId={abstract_filter: i, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MeasureMergedCoaddSourcesTask, label=measure) dataId={abstract_filter: r, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MeasureMergedCoaddSourcesTask, label=measure) dataId={abstract_filter: i, skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
ctrl.mpexec.mpGraphExecutor ERROR: Upstream job failed for task <TaskDef(MergeMeasurementsTask, label=mergeMeasurements) dataId={skymap: discrete/ci_hsc, tract: 0, patch: 69}>, skipping this task.
|
It looks like it was
DM-26136then. In that case I've assigned this ticket to Andy Salnikov.