Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-36701

Define configuration for obscore manager at USDF

    XMLWordPrintable

    Details

      Description

      While we are waiting to pgsphere installation at usdf-butler we can try to agree on what configuration is needed for obscore manager. The configuration determines the schema for obscore table, so it's best to try to nail it perfectly to avoid schema migrations later. A good starting point would be the configuration used for DP0.2 that was made to populate obscore table in QServ. Compared to that one, I expect few differences:

      • I want to add few extra columns, in particular visit ID and exposure ID that are necessary for determining exposure region
      • pg_sphere plugin configuration
      • collection configuration, the CSV export tool could use any collection type, but manager configuration accepts RUN-type collection patterns (I think we are not ready for a single TAGGED collection case)

      I think the main points in this exercise are:

      • determine which repositories we want to obscore-ize
      • for each repo define a list of collection names/patterns to include
      • similarly dataset types

      Once the configuration is settled, I can write migration script(s) to add it to the repositories.

        Attachments

          Issue Links

            Activity

            No builds found.
            salnikov Andy Salnikov created issue -
            Hide
            salnikov Andy Salnikov added a comment -

            Configuration used for exporting obscore to QServ (from here):

            facility_name: Rubin-LSST
            obs_collection: LSST.DP02
            collections: ["2.2i/runs/DP0.2"]
            use_butler_uri: false
            dataset_types:
              raw:
                dataproduct_type: image
                dataproduct_subtype: lsst.raw
                calib_level: 1
                obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.calexp
                calib_level: 2
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              deepCoadd_calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.deepCoadd_calexp
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingCoadd:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingCoadd
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingDiff_differenceExp:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingDiff_differenceExp
                calib_level: 3
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
            extra_columns:
              lsst_visit:
                template: "{visit}"
                type: "int"
              lsst_detector:
                template: "{detector}"
                type: "int"
              lsst_tract:
                template: "{tract}"
                type: "int"
              lsst_patch:
                template: "{patch}"
                type: "int"
              lsst_band:
                template: "{band}"
                type: "str"
              lsst_filter:
                template: "{physical_filter}"
                type: "str"
            spectral_ranges:
              "u": [330.0e-9, 400.0e-9]
              "g": [402.0e-9, 552.0e-9]
              "r": [552.0e-9, 691.0e-9]
              "i": [691.0e-9, 818.0e-9]
              "z": [818.0e-9, 922.0e-9]
              "y": [970.0e-9, 1060.0e-9]
            

            Show
            salnikov Andy Salnikov added a comment - Configuration used for exporting obscore to QServ (from here ): facility_name: Rubin-LSST obs_collection: LSST.DP02 collections: ["2.2i/runs/DP0.2"] use_butler_uri: false dataset_types: raw: dataproduct_type: image dataproduct_subtype: lsst.raw calib_level: 1 obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" calexp: dataproduct_type: image dataproduct_subtype: lsst.calexp calib_level: 2 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" deepCoadd_calexp: dataproduct_type: image dataproduct_subtype: lsst.deepCoadd_calexp calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingCoadd: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingCoadd calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingDiff_differenceExp: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingDiff_differenceExp calib_level: 3 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" extra_columns: lsst_visit: template: "{visit}" type: "int" lsst_detector: template: "{detector}" type: "int" lsst_tract: template: "{tract}" type: "int" lsst_patch: template: "{patch}" type: "int" lsst_band: template: "{band}" type: "str" lsst_filter: template: "{physical_filter}" type: "str" spectral_ranges: "u": [330.0e-9, 400.0e-9] "g": [402.0e-9, 552.0e-9] "r": [552.0e-9, 691.0e-9] "i": [691.0e-9, 818.0e-9] "z": [818.0e-9, 922.0e-9] "y": [970.0e-9, 1060.0e-9]
            salnikov Andy Salnikov made changes -
            Field Original Value New Value
            Watchers Andy Salnikov, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness [ Andy Salnikov, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness ] Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness [ Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness ]
            Hide
            salnikov Andy Salnikov added a comment -

            Related question - should I use table schema definition from sdm_schemas? Right now schema and all column names are hard-coded into Python code, they should be consistent with sdm_schemas, but more importantly the code which fills the records must be consistent with the table schema. I think at this point it probably makes sense to keep things as they are now, but I'm open to suggestions.

            Show
            salnikov Andy Salnikov added a comment - Related question - should I use table schema definition from sdm_schemas ? Right now schema and all column names are hard-coded into Python code, they should be consistent with sdm_schemas, but more importantly the code which fills the records must be consistent with the table schema. I think at this point it probably makes sense to keep things as they are now, but I'm open to suggestions.
            tjenness Tim Jenness made changes -
            Watchers Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness [ Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Tim Jenness ] Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness [ Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness ]
            Hide
            tjenness Tim Jenness added a comment -

            Are you saying we should start considering depending on felis et al for the dax_obscore package or daf_butler?

            As far as the repos to migrate I assume /repo/oga is the one that is most important for recent observing and needs the incremental support the most but that partly depends on the first use case (cc/ Frossie Economou). /repo/main opens up a much bigger question as to whether we are including all the HSC data or just LATISS and ComCam.

            Show
            tjenness Tim Jenness added a comment - Are you saying we should start considering depending on felis et al for the dax_obscore package or daf_butler? As far as the repos to migrate I assume /repo/oga is the one that is most important for recent observing and needs the incremental support the most but that partly depends on the first use case (cc/ Frossie Economou ). /repo/main opens up a much bigger question as to whether we are including all the HSC data or just LATISS and ComCam.
            Hide
            salnikov Andy Salnikov added a comment -

            I would not really want felis dependency for daf_butler. OTOH there are two definitions for the obscore schema - one hardcoded into daf_butler (and obscore config) and another in sdm_schemas. It's not clear to me how can we make sure that they are consistent. I guess this is something to consider for future development.

            Show
            salnikov Andy Salnikov added a comment - I would not really want felis dependency for daf_butler. OTOH there are two definitions for the obscore schema - one hardcoded into daf_butler (and obscore config) and another in sdm_schemas. It's not clear to me how can we make sure that they are consistent. I guess this is something to consider for future development.
            Hide
            salnikov Andy Salnikov added a comment -

            What credentials do I need for /repo/oga? I'm getting "botocore.exceptions.NoCredentialsError: Unable to locate credentials" when instantiating Butler for that repo.

            Show
            salnikov Andy Salnikov added a comment - What credentials do I need for /repo/oga? I'm getting "botocore.exceptions.NoCredentialsError: Unable to locate credentials" when instantiating Butler for that repo.
            salnikov Andy Salnikov made changes -
            Epic Link DM-30629 [ 513192 ] PREOPS-1593 [ 2393150 ]
            Sprint DB_F22_6 [ 1172 ]
            Team Data Access and Database [ 10204 ] Ops Middleware [ 15600 ]
            Hide
            salnikov Andy Salnikov added a comment -

            Initially we are going to add obscore to OGA/embargo repository, it's easier to experiment as it has less data, and we can drop and recreate obscore table quickly if we need to improve configuration.

            Gregory Dubois-Felsmann, here is the first iteration for obscore configuration for that repo based on DP02 configuration above. I annotated it so it should be easier to figure out what changes are needed.

            # namespace and version are for possible schema migration only, do not affect table contents
            namespace: embargo
            version: 1
            facility_name: Rubin-LSST
            obs_collection: LSST.EMBARGO      # this likely needs a better name?
            collection_type: RUN              # means we are using all RUN-type collections that match line below
            collections: ["LATISS/.*", "LSSTCam/.*", "LSSTComCam/.*"]
            use_butler_uri: false             # do not use URI from Butler, use datalink_url_fmt defined below
            dataset_types:
              raw:
                dataproduct_type: image
                dataproduct_subtype: lsst.raw
                calib_level: 1
                obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                # I guess we need to change this and use something different in place of "dp02"?
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.calexp
                calib_level: 2
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              deepCoadd_calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.deepCoadd_calexp
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingCoadd:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingCoadd
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingDiff_differenceExp:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingDiff_differenceExp
                calib_level: 3
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
            extra_columns:
              lsst_visit:
                template: "{visit}"
                type: "int"
              lsst_exposure:
                template: "{exposure}"
                type: "int"
              lsst_detector:
                template: "{detector}"
                type: "int"
              lsst_tract:
                template: "{tract}"
                type: "int"
              lsst_patch:
                template: "{patch}"
                type: "int"
              lsst_band:
                template: "{band}"
                type: "string"
                length: 32
              lsst_filter:
                template: "{physical_filter}"
                type: "string"
                length: 32
              lsst_dataset_type:
                template: "{dataset_type}"
                type: "string"
                length: 64
              lsst_run:
                template: "{run}"
                type: "string"
                length: 255
            indices:
              # Indices for obscore table, spatial columns are indexed automatically.
              # We likely will need to extend this list to support most popular queries,
              # would be good to have a list of possible queries generated by TAP.
              instrument_name_idx: instrument_name
              lsst_visit_idx: lsst_visit
              lsst_exposure_idx: lsst_exposure
              dataproduct_idx: [dataproduct_type, dataproduct_subtype]
            spectral_ranges:
              # This list includes every band defined now in registry, actual values for some
              # of them are probably very approximate. Keys in this section coule be a band
              # name of a physical filter name
              "u": [330.0e-9, 400.0e-9]
              "u~nd": [330.0e-9, 400.0e-9]
              "g": [402.0e-9, 552.0e-9]
              "g~nd": [402.0e-9, 552.0e-9]
              "r": [552.0e-9, 691.0e-9]
              "r~nd": [552.0e-9, 691.0e-9]
              "i": [691.0e-9, 818.0e-9]
              "i~nd": [691.0e-9, 818.0e-9]
              "z": [818.0e-9, 922.0e-9]
              "z~nd": [818.0e-9, 922.0e-9]
              "y": [970.0e-9, 1060.0e-9]
              "y~nd": [970.0e-9, 1060.0e-9]
              "white": [null, null]
              "unknown": [null, null]
              "diffuser": [null, null]
              "notch": [null, null]
              "grid": [null, null]
              "grid~nd": [null, null]
              "spot": [null, null]
              "spot~nd": [null, null]
            spatial_plugins:
              pgsphere:
                # adds pgsphere columns and indices
                cls: lsst.daf.butler.registry.obscore.pgsphere.PgSphereObsCorePlugin
                config:
                  region_column: pgs_region         # name of a column for a region/polygon
                  position_column: pgs_center       # name of a column for position/center
            

            Some general comments:

            • spectral_ranges defines every possible band, this is because obscore manager will produce a warning for a band/filter which is not in this list. I should probably remove that warning to simply store NULL for unknown filters.
            • spectral_ranges now defines mapping for band names, if more exact per-physical filter ranges are needed, they can be added there using physical filter name (physical filter name takes precedence over band name there)
            • I guess datalink_url_fmt will be different from DP02
            • we likely need more indices on obscore table, but that needs some feedback from TAP for what sorts of queries we should expect to be popular
            Show
            salnikov Andy Salnikov added a comment - Initially we are going to add obscore to OGA/embargo repository, it's easier to experiment as it has less data, and we can drop and recreate obscore table quickly if we need to improve configuration. Gregory Dubois-Felsmann , here is the first iteration for obscore configuration for that repo based on DP02 configuration above. I annotated it so it should be easier to figure out what changes are needed. # namespace and version are for possible schema migration only, do not affect table contents namespace: embargo version: 1 facility_name: Rubin-LSST obs_collection: LSST.EMBARGO # this likely needs a better name? collection_type: RUN # means we are using all RUN-type collections that match line below collections: ["LATISS/.*", "LSSTCam/.*", "LSSTComCam/.*"] use_butler_uri: false # do not use URI from Butler, use datalink_url_fmt defined below dataset_types: raw: dataproduct_type: image dataproduct_subtype: lsst.raw calib_level: 1 obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits # I guess we need to change this and use something different in place of "dp02"? datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" calexp: dataproduct_type: image dataproduct_subtype: lsst.calexp calib_level: 2 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" deepCoadd_calexp: dataproduct_type: image dataproduct_subtype: lsst.deepCoadd_calexp calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingCoadd: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingCoadd calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingDiff_differenceExp: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingDiff_differenceExp calib_level: 3 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" extra_columns: lsst_visit: template: "{visit}" type: "int" lsst_exposure: template: "{exposure}" type: "int" lsst_detector: template: "{detector}" type: "int" lsst_tract: template: "{tract}" type: "int" lsst_patch: template: "{patch}" type: "int" lsst_band: template: "{band}" type: "string" length: 32 lsst_filter: template: "{physical_filter}" type: "string" length: 32 lsst_dataset_type: template: "{dataset_type}" type: "string" length: 64 lsst_run: template: "{run}" type: "string" length: 255 indices: # Indices for obscore table, spatial columns are indexed automatically. # We likely will need to extend this list to support most popular queries, # would be good to have a list of possible queries generated by TAP. instrument_name_idx: instrument_name lsst_visit_idx: lsst_visit lsst_exposure_idx: lsst_exposure dataproduct_idx: [dataproduct_type, dataproduct_subtype] spectral_ranges: # This list includes every band defined now in registry, actual values for some # of them are probably very approximate. Keys in this section coule be a band # name of a physical filter name "u": [330.0e-9, 400.0e-9] "u~nd": [330.0e-9, 400.0e-9] "g": [402.0e-9, 552.0e-9] "g~nd": [402.0e-9, 552.0e-9] "r": [552.0e-9, 691.0e-9] "r~nd": [552.0e-9, 691.0e-9] "i": [691.0e-9, 818.0e-9] "i~nd": [691.0e-9, 818.0e-9] "z": [818.0e-9, 922.0e-9] "z~nd": [818.0e-9, 922.0e-9] "y": [970.0e-9, 1060.0e-9] "y~nd": [970.0e-9, 1060.0e-9] "white": [null, null] "unknown": [null, null] "diffuser": [null, null] "notch": [null, null] "grid": [null, null] "grid~nd": [null, null] "spot": [null, null] "spot~nd": [null, null] spatial_plugins: pgsphere: # adds pgsphere columns and indices cls: lsst.daf.butler.registry.obscore.pgsphere.PgSphereObsCorePlugin config: region_column: pgs_region # name of a column for a region/polygon position_column: pgs_center # name of a column for position/center Some general comments: spectral_ranges defines every possible band, this is because obscore manager will produce a warning for a band/filter which is not in this list. I should probably remove that warning to simply store NULL for unknown filters. spectral_ranges now defines mapping for band names, if more exact per-physical filter ranges are needed, they can be added there using physical filter name (physical filter name takes precedence over band name there) I guess datalink_url_fmt will be different from DP02 we likely need more indices on obscore table, but that needs some feedback from TAP for what sorts of queries we should expect to be popular
            salnikov Andy Salnikov made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            Show
            ktl Kian-Tat Lim added a comment - https://github.com/slaclab/phalanx/blob/usdfprod/science-platform/values-usdfprod.yaml#L33-L34 will likely need to be changed.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Kian-Tat Lim, I have no idea what is that and why do I need to change it. This ticket is for defining configuration for registry manager so that we can create and fill obscore table. I guess that something also needs to be done for TAP service, but it should be done on a separate ticket.

            Show
            salnikov Andy Salnikov added a comment - - edited Kian-Tat Lim , I have no idea what is that and why do I need to change it. This ticket is for defining configuration for registry manager so that we can create and fill obscore table. I guess that something also needs to be done for TAP service, but it should be done on a separate ticket.
            gpdf Gregory Dubois-Felsmann made changes -
            Link This issue relates to DM-37275 [ DM-37275 ]
            gpdf Gregory Dubois-Felsmann made changes -
            Link This issue relates to DM-37276 [ DM-37276 ]
            salnikov Andy Salnikov made changes -
            Link This issue relates to DM-35850 [ DM-35850 ]
            salnikov Andy Salnikov made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]
            Hide
            salnikov Andy Salnikov added a comment -

            Closing this, Gregory added initial version of a config file on DM-37276.

            Show
            salnikov Andy Salnikov added a comment - Closing this, Gregory added initial version of a config file on DM-37276 .
            salnikov Andy Salnikov made changes -
            Component/s dax_obscore [ 20400 ]
            salnikov Andy Salnikov made changes -
            Story Points 2

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.