# validate_drp cfht/decam datasets timing out post pybind11 merge

## Details

• Type: Story
• Status: To Do
• Resolution: Unresolved
• Fix Version/s: None
• Component/s: None
• Labels:
None
• Team:
SQuaRE

## Description

The cfht and decam datasets have timed out several times now after the pybind11 merge. Eg. https://ci.lsst.codes/job/validate_drp/851/

The runtime before failure for both datasets is ~2 hours 47mins. This isn't conclusively related to the pybind11 merge but the timing is coincidental.

There have been recent changes to both pipe_tasks and pipe_drivers.

 Traceback (most recent call last):  File "/home/jenkins-slave/workspace/validate_drp/dataset/cfht/label/centos-7/python/py2/lsstsw/stack/Linux64/pipe_tasks/13.0-4-gedc5e2a/bin/processCcd.py", line 25, in   ProcessCcdTask.parseAndRun()  File "/home/jenkins-slave/workspace/validate_drp/dataset/cfht/label/centos-7/python/py2/lsstsw/stack/Linux64/pipe_base/13.0+5/python/lsst/pipe/base/cmdLineTask.py", line 482, in parseAndRun  resultList = taskRunner.run(parsedCmd)  File "/home/jenkins-slave/workspace/validate_drp/dataset/cfht/label/centos-7/python/py2/lsstsw/stack/Linux64/pipe_base/13.0+5/python/lsst/pipe/base/cmdLineTask.py", line 209, in run  resultList = list(mapFunc(self, targetList))  File "/home/jenkins-slave/workspace/validate_drp/dataset/cfht/label/centos-7/python/py2/lsstsw/stack/Linux64/pipe_base/13.0+5/python/lsst/pipe/base/cmdLineTask.py", line 70, in _runPool  return pool.map_async(functools.partial(_poolFunctionWrapper, function), iterable).get(timeout)  File "/home/jenkins-slave/workspace/validate_drp/dataset/cfht/label/centos-7/python/py2/lsstsw/miniconda/lib/python2.7/multiprocessing/pool.py", line 563, in get  raise TimeoutError multiprocessing.TimeoutError 

## Activity

Joshua Hoblitt added a comment -

To clarify, the hsc dataset is working. It differs from the cfht/decam datasets in that it uses pipe_drivers.

Joshua Hoblitt added a comment -

The default timeout value in pipe_base is 9999s, which explains the ~ 2 hour 47 min runtime(s):

Joshua Hoblitt added a comment -

It appears that the timeout can be controlled via a --timeout argument to processCcd.py. It is probably worth adding support for this flag to examples/runExample.sh, as we know both of these datasets should have much shorter runtimes.

Joshua Hoblitt added a comment - - edited

When implementing DM-9749, I tested running with NUMPROC set to 1, and the cfht dataset was able to run to completion.

Joshua Hoblitt added a comment -

Per discussion on slack with Michael Wood-Vasey, we are going to set NUMPROC = 1 in production for the cfht/decam datasets in order to get them working for the time being.

## People

• Assignee:
Unassigned
Reporter:
Joshua Hoblitt
Watchers:
Angelo Fausti, John Parejko, Jonathan Sick, Joshua Hoblitt, Michael Wood-Vasey