Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-34129

Allow DataFrame Actions to take formatable columns

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Invalid
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: analysis_drp, pipe_tasks
    • Labels:
      None
    • Story Points:
      2
    • Team:
      Data Release Production
    • Urgent?:
      No

      Description

      Allow DataFrameActions to format columns based on passed in keywords. This will allow downstream plotting code to work over multiple bands within one task.

        Attachments

          Activity

          Hide
          kbechtol Keith Bechtol added a comment - - edited

          Here is a summary of the current implementation of selector actions in faro as related to this ticket.

          Problem we were trying to solve:
          Imagine that we have an object catalog for a given tract and we want to compute one metric value per band. We iterate over bands and for each band, we want to load the columns specific for that band that are specified in the selector actions (e.g., load g_psfFlux in order to apply a SNR selection for the g band to get bright stars).

          Selector definitions are located in https://github.com/lsst/faro/blob/main/python/lsst/faro/utils/selectors.py ; see for example SNRSelector

          For an example of how selectors are used to identify which columns to load from objectTable_tract, see https://github.com/lsst/faro/blob/main/python/lsst/faro/measurement/TractTableMeasurement.py, in particular, the runQuantum method of TractTableMeasurementTask

              def runQuantum(self, butlerQC, inputRefs, outputRefs):
                  inputs = butlerQC.get(inputRefs)
                  kwargs = {"currentBands": butlerQC.quantum.dataId['band']}
           
                  columns = list(self.config.measure.columns.values())
                  for column in self.config.measure.columnsBand.values():
                      columns.append(kwargs["currentBands"] + '_' + column)
                  columnsWithSelectors = self._getTableColumnsSelectors(columns, kwargs["currentBands"])
                  kwargs["catalog"] = inputs["catalog"].get(parameters={"columns": columnsWithSelectors})
          

          The _getTableColumnsSelectors function is defined in https://github.com/lsst/faro/blob/main/python/lsst/faro/base/CatalogMeasurementBase.py in the CatalogMeasurementBaseTask

              def _getTableColumnsSelectors(self, columns, currentBands=None):
                  """given a list of selectors return columns required to apply these
                  selectors.
                  Parameters
                  ----------
                  columns:  `list` [`str`]
                  a list of columns required to calculate a metric. This list
                  is appended with any addditional columns required for the selectorActions.
                  currentBands:  `list` [`str`]
                  The filter band(s) associated with the observations.
                  Returns
                  -------
                  columnNames: `list` [`str`] the set of columns required to compute a
                  metric with any addditional columns required for the selectorActions
                  appended to the set.
                  """
                  columnNames = set(columns)
                  for actionStruct in [self.config.measure.selectorActions]:
                      for action in actionStruct:
                          for col in action.columns(currentBands):
                              columnNames.add(col)
           
                  return columnNames
          

          The critical line of code above is the following:

          action.columns(currentBands)

          where we pass in the relevant band or bands to the selector action in order to return the list of columns corresponding to those bands. For example, for the SNRSelector

              def columns(self, currentBands=None):
                  allCols = []
                  if self.selectorBandType == "staticBandSet":
                      bands = self.staticBandSet
                  else:
                      bands = currentBands
           
                  if bands is not None:
                      for band in bands:
                          allCols += [band+'_'+self.fluxType, band+'_'+self.fluxType+'Err']
                  else:
                      allCols = [self.fluxType, self.fluxType+'Err']
                  return allCols
          

          As I understand it, the difference in implementation of selector actions for analysis_drp https://github.com/lsst/analysis_drp/blob/main/python/lsst/analysis/drp/dataSelectors.py is that the bands are not updated in run time, but are set in advance in configuration.

          Show
          kbechtol Keith Bechtol added a comment - - edited Here is a summary of the current implementation of selector actions in faro as related to this ticket. Problem we were trying to solve: Imagine that we have an object catalog for a given tract and we want to compute one metric value per band. We iterate over bands and for each band, we want to load the columns specific for that band that are specified in the selector actions (e.g., load g_psfFlux in order to apply a SNR selection for the g band to get bright stars). Selector definitions are located in https://github.com/lsst/faro/blob/main/python/lsst/faro/utils/selectors.py ; see for example SNRSelector For an example of how selectors are used to identify which columns to load from objectTable_tract, see https://github.com/lsst/faro/blob/main/python/lsst/faro/measurement/TractTableMeasurement.py , in particular, the runQuantum method of TractTableMeasurementTask def runQuantum( self , butlerQC, inputRefs, outputRefs): inputs = butlerQC.get(inputRefs) kwargs = { "currentBands" : butlerQC.quantum.dataId[ 'band' ]}   columns = list ( self .config.measure.columns.values()) for column in self .config.measure.columnsBand.values(): columns.append(kwargs[ "currentBands" ] + '_' + column) columnsWithSelectors = self ._getTableColumnsSelectors(columns, kwargs[ "currentBands" ]) kwargs[ "catalog" ] = inputs[ "catalog" ].get(parameters = { "columns" : columnsWithSelectors}) The _getTableColumnsSelectors function is defined in https://github.com/lsst/faro/blob/main/python/lsst/faro/base/CatalogMeasurementBase.py in the CatalogMeasurementBaseTask def _getTableColumnsSelectors( self , columns, currentBands = None ): """given a list of selectors return columns required to apply these selectors. Parameters - - - - - - - - - - columns: ` list ` [` str `] a list of columns required to calculate a metric. This list is appended with any addditional columns required for the selectorActions. currentBands: ` list ` [` str `] The filter band(s) associated with the observations. Returns - - - - - - - columnNames: ` list ` [` str `] the set of columns required to compute a metric with any addditional columns required for the selectorActions appended to the set . """ columnNames = set (columns) for actionStruct in [ self .config.measure.selectorActions]: for action in actionStruct: for col in action.columns(currentBands): columnNames.add(col)   return columnNames The critical line of code above is the following: action.columns(currentBands) where we pass in the relevant band or bands to the selector action in order to return the list of columns corresponding to those bands. For example, for the SNRSelector def columns( self , currentBands = None ): allCols = [] if self .selectorBandType = = "staticBandSet" : bands = self .staticBandSet else : bands = currentBands   if bands is not None : for band in bands: allCols + = [band + '_' + self .fluxType, band + '_' + self .fluxType + 'Err' ] else : allCols = [ self .fluxType, self .fluxType + 'Err' ] return allCols As I understand it, the difference in implementation of selector actions for analysis_drp https://github.com/lsst/analysis_drp/blob/main/python/lsst/analysis/drp/dataSelectors.py is that the bands are not updated in run time, but are set in advance in configuration.
          Hide
          nlust Nate Lust added a comment -

          This has been completely superseded by analysis_tools redesign where the functionality is largely as desired.

          Show
          nlust Nate Lust added a comment - This has been completely superseded by analysis_tools redesign where the functionality is largely as desired.

            People

            Assignee:
            nlust Nate Lust
            Reporter:
            nlust Nate Lust
            Watchers:
            Jeffrey Carlin, Keith Bechtol, Nate Lust, Peter Ferguson, Sophie Reed, Yusra AlSayyad
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Jenkins

                No builds found.