# Inconsistent source detection workflow with difference imaging

XMLWordPrintable

## Details

• Type: Story
• Status: To Do
• Resolution: Unresolved
• Fix Version/s: None
• Component/s:
• Labels:
• Team:

## Description

If a user wants to do difference imaging with some calexps as science images and some coadds as templates and decides to run ImageDifferenceTask with doSelectSources=True (the default!), the result is an error about how template sources are not available. Differencing works fine with doSelectSources=False.

The standard AP workflow doesn't include running detectCoaddSources, because we do source detection on the difference images. I may be wrong, but I don't think there is any benefit to the result of image differencing a catalog of sources previously detected in the template coadd exists. This problem is easy to encounter and difficult to troubleshoot if you haven't seen it before. A sample traceback thanks to Hayden Smotherman is below.

A resolution to this ticket can be one of (1) convince me I should be doing source detection on my templates before running difference imaging, (2) change the default to doSelectSources=False, OR (3) Your Idea Welcome Here.

 imageDifference INFO: Processing DataId(initialdata={'visit': 433932, 'ccdnum': 12, 'filter': 'g'}, tag=set()) imageDifference.getTemplate INFO: Using skyMap tract 0 imageDifference.getTemplate INFO: Assembling 2 coadd patches imageDifference.getTemplate INFO: exposure dimensions=(2046, 4094); coadd dimensions=(3247, 1644) imageDifference.getTemplate INFO: Reading patch {'datasetType': 'deepCoadd_sub', 'bbox': Box2I(minimum=Point2I(3666, 17405), dimensions=Extent2I(434, 1644)), 'tract': 0, 'patch': '0,4', 'numSubfilters': 3} imageDifference.getTemplate INFO: Reading patch {'datasetType': 'deepCoadd_sub', 'bbox': Box2I(minimum=Point2I(3900, 17405), dimensions=Extent2I(3013, 1644)), 'tract': 0, 'patch': '1,4', 'numSubfilters': 3} imageDifference INFO: Source selection via src product imageDifference FATAL: Failed on dataId=DataId(initialdata={'visit': 433932, 'ccdnum': 12, 'filter': 'g'}, tag=set()): RuntimeError: doSelectSources=True and kernelSourcesFromRef=False,but template sources not available. Cannot match science sources with template sources. Run process* on data from which templates are built. Traceback (most recent call last):  File "/astro/store/epyc/users/smotherh/lsst/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_base/16.0-30-g6787e8a+1/python/lsst/pipe/base/cmdLineTask.py", line 388, in __call__  result = self.runTask(task, dataRef, kwargs)  File "/astro/store/epyc/users/smotherh/lsst/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_base/16.0-30-g6787e8a+1/python/lsst/pipe/base/cmdLineTask.py", line 447, in runTask  return task.runDataRef(dataRef, **kwargs)  File "/astro/store/epyc/users/smotherh/lsst/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_base/16.0-30-g6787e8a+1/python/lsst/pipe/base/timer.py", line 149, in wrapper  res = func(self, *args, **keyArgs)  File "/astro/store/epyc/users/smotherh/lsst/stack/miniconda3-4.5.4-fcd27eb/Linux64/pipe_tasks/16.0-61-gb2b2650a/python/lsst/pipe/tasks/imageDifference.py", line 432, in runDataRef  raise RuntimeError("doSelectSources=True and kernelSourcesFromRef=False," RuntimeError: doSelectSources=True and kernelSourcesFromRef=False,but template sources not available. Cannot match science sources with template sources. Run process* on data from which templates are built. 

## Activity

Hide
John Swinbank added a comment -

Sooo... this issue is at least moderately subtle. Further, I'm hardly an expert on image differencing algorithms so Gabor Kovacs (or whoever) will have to correct my misapprehensions.

However. What's at issue here is: how do you generate a list of sources to use to generate your differencing kernel?

There are broadly two options (at least, in the code; there may be other ways of doing it in principle).

• You can get a list of appropriate sources in the field of view from your reference catalog; or
• You can get source lists from the science and template images, then cross-match them to get a list of sources that appear in both.

My hunch is that the second is likely to work better, since that's relying on what's actually in the data rather than what you hope is in the data. However, this is the sort of thing that Gabor Kovacs (or Eric Bellm) might have more educated opinions on than I do.

However, it looks like the author of this code agreed with my hunch, since that's what it attempts to do by default. And, of course, in order to do that it wants a list of sources from the science image (which it can get either from the Butler, or by running source detection) and from the template, which it assumes it can get from the Butler.

Now, you are thinking that's a questionable assumption. But, in fact, in operations it's not: templates are generated during data release production, which has already done exquisite source detection and stashed those catalogues away for later use. Relying on them rather than doing your own thing is the right thing to do.

So what does that mean? Well, there are a couple of options here:

• When we are ingesting templates, we could/should also be ingesting source lists to go with them. That would be more realistic, in terms of being similar to how things will work in operations.
• We could change the default setting to rely on fetching sources from the reference catalog, rather than from actual detections. We'd have to rely on Gabor & Eric to advise on whether they regard that as scientifically wise.
• We could just run a sourcefinding step on the template as part of the AP pipeline, which you don't like.

In any event, I don't think this one is easy to “fix” without a deeper understanding of what ought to be happening.

Show
John Swinbank added a comment - Sooo... this issue is at least moderately subtle. Further, I'm hardly an expert on image differencing algorithms so Gabor Kovacs (or whoever) will have to correct my misapprehensions. However. What's at issue here is: how do you generate a list of sources to use to generate your differencing kernel? There are broadly two options (at least, in the code; there may be other ways of doing it in principle). You can get a list of appropriate sources in the field of view from your reference catalog; or You can get source lists from the science and template images, then cross-match them to get a list of sources that appear in both. My hunch is that the second is likely to work better, since that's relying on what's actually in the data rather than what you hope is in the data. However, this is the sort of thing that Gabor Kovacs (or Eric Bellm ) might have more educated opinions on than I do. However, it looks like the author of this code agreed with my hunch, since that's what it attempts to do by default. And, of course, in order to do that it wants a list of sources from the science image (which it can get either from the Butler, or by running source detection) and from the template, which it assumes it can get from the Butler. Now, you are thinking that's a questionable assumption. But, in fact, in operations it's not: templates are generated during data release production, which has already done exquisite source detection and stashed those catalogues away for later use. Relying on them rather than doing your own thing is the right thing to do. So what does that mean? Well, there are a couple of options here: When we are ingesting templates, we could/should also be ingesting source lists to go with them. That would be more realistic, in terms of being similar to how things will work in operations. We could change the default setting to rely on fetching sources from the reference catalog, rather than from actual detections. We'd have to rely on Gabor & Eric to advise on whether they regard that as scientifically wise. We could just run a sourcefinding step on the template as part of the AP pipeline, which you don't like. In any event, I don't think this one is easy to “fix” without a deeper understanding of what ought to be happening.
Hide
John Swinbank added a comment -

Oh, one other things that might be at least tangentially relevant: in BG3-land, all tasks have to declare their inputs explicitly up front. That won't resolve the dilemma above, but it will explicitly flag it and enable the pipeline to fail early with a useful error message: if the task requires a coadd source catalog and one isn't provided, you simply won't be able to build an execution graph.

Show
John Swinbank added a comment - Oh, one other things that might be at least tangentially relevant: in BG3-land, all tasks have to declare their inputs explicitly up front. That won't resolve the dilemma above, but it will explicitly flag it and enable the pipeline to fail early with a useful error message: if the task requires a coadd source catalog and one isn't provided, you simply won't be able to build an execution graph.
Hide
Eric Bellm added a comment -

In operations, don't we expect that "refcat sources" and "template sources" are more-or-less identical? They are both coming from LSST DRP.

In general I would expect that we should not have to re-run source detection on the template every time we do difference imaging, so I think that points to option 1 as most realistic, as you say. Whether it's worth changing the defaults now I don't have a strong opinion.

Show
Eric Bellm added a comment - In operations, don't we expect that "refcat sources" and "template sources" are more-or-less identical? They are both coming from LSST DRP. In general I would expect that we should not have to re-run source detection on the template every time we do difference imaging, so I think that points to option 1 as most realistic, as you say. Whether it's worth changing the defaults now I don't have a strong opinion.
Hide
John Swinbank added a comment -

In operations, don't we expect that "refcat sources" and "template sources" are more-or-less identical? They are both coming from LSST DRP.

Well... not in year 1, anyway.

Beyond that, LDM-151 says that AP will use some “subset of the Object table” as a reference catalog. That's not quite the same as “the results of source detection on the template”.

Worth noting that if we're building DCR-corrected templates “on the fly”, they won't have pre-ingested source catalogs.

Show
John Swinbank added a comment - In operations, don't we expect that "refcat sources" and "template sources" are more-or-less identical? They are both coming from LSST DRP. Well... not in year 1, anyway. Beyond that, LDM-151 says that AP will use some “subset of the Object table” as a reference catalog. That's not quite the same as “the results of source detection on the template”. Worth noting that if we're building DCR-corrected templates “on the fly”, they won't have pre-ingested source catalogs.

## People

• Assignee:
Unassigned
Reporter:
Meredith Rawls
Watchers:
Colin Slater, Eric Bellm, Gabor Kovacs, John Swinbank, Meredith Rawls