Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-36701

Define configuration for obscore manager at USDF

    XMLWordPrintable

    Details

      Description

      While we are waiting to pgsphere installation at usdf-butler we can try to agree on what configuration is needed for obscore manager. The configuration determines the schema for obscore table, so it's best to try to nail it perfectly to avoid schema migrations later. A good starting point would be the configuration used for DP0.2 that was made to populate obscore table in QServ. Compared to that one, I expect few differences:

      • I want to add few extra columns, in particular visit ID and exposure ID that are necessary for determining exposure region
      • pg_sphere plugin configuration
      • collection configuration, the CSV export tool could use any collection type, but manager configuration accepts RUN-type collection patterns (I think we are not ready for a single TAGGED collection case)

      I think the main points in this exercise are:

      • determine which repositories we want to obscore-ize
      • for each repo define a list of collection names/patterns to include
      • similarly dataset types

      Once the configuration is settled, I can write migration script(s) to add it to the repositories.

        Attachments

          Issue Links

            Activity

            Hide
            salnikov Andy Salnikov added a comment -

            What credentials do I need for /repo/oga? I'm getting "botocore.exceptions.NoCredentialsError: Unable to locate credentials" when instantiating Butler for that repo.

            Show
            salnikov Andy Salnikov added a comment - What credentials do I need for /repo/oga? I'm getting "botocore.exceptions.NoCredentialsError: Unable to locate credentials" when instantiating Butler for that repo.
            Hide
            salnikov Andy Salnikov added a comment -

            Initially we are going to add obscore to OGA/embargo repository, it's easier to experiment as it has less data, and we can drop and recreate obscore table quickly if we need to improve configuration.

            Gregory Dubois-Felsmann, here is the first iteration for obscore configuration for that repo based on DP02 configuration above. I annotated it so it should be easier to figure out what changes are needed.

            # namespace and version are for possible schema migration only, do not affect table contents
            namespace: embargo
            version: 1
            facility_name: Rubin-LSST
            obs_collection: LSST.EMBARGO      # this likely needs a better name?
            collection_type: RUN              # means we are using all RUN-type collections that match line below
            collections: ["LATISS/.*", "LSSTCam/.*", "LSSTComCam/.*"]
            use_butler_uri: false             # do not use URI from Butler, use datalink_url_fmt defined below
            dataset_types:
              raw:
                dataproduct_type: image
                dataproduct_subtype: lsst.raw
                calib_level: 1
                obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                # I guess we need to change this and use something different in place of "dp02"?
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.calexp
                calib_level: 2
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              deepCoadd_calexp:
                dataproduct_type: image
                dataproduct_subtype: lsst.deepCoadd_calexp
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingCoadd:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingCoadd
                calib_level: 3
                obs_id_fmt: "{skymap}-{tract}-{patch}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
              goodSeeingDiff_differenceExp:
                dataproduct_type: image
                dataproduct_subtype: lsst.goodSeeingDiff_differenceExp
                calib_level: 3
                obs_id_fmt: "{records[visit].name}-{records[detector].full_name}"
                o_ucd: phot.count
                access_format: image/fits
                datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}"
            extra_columns:
              lsst_visit:
                template: "{visit}"
                type: "int"
              lsst_exposure:
                template: "{exposure}"
                type: "int"
              lsst_detector:
                template: "{detector}"
                type: "int"
              lsst_tract:
                template: "{tract}"
                type: "int"
              lsst_patch:
                template: "{patch}"
                type: "int"
              lsst_band:
                template: "{band}"
                type: "string"
                length: 32
              lsst_filter:
                template: "{physical_filter}"
                type: "string"
                length: 32
              lsst_dataset_type:
                template: "{dataset_type}"
                type: "string"
                length: 64
              lsst_run:
                template: "{run}"
                type: "string"
                length: 255
            indices:
              # Indices for obscore table, spatial columns are indexed automatically.
              # We likely will need to extend this list to support most popular queries,
              # would be good to have a list of possible queries generated by TAP.
              instrument_name_idx: instrument_name
              lsst_visit_idx: lsst_visit
              lsst_exposure_idx: lsst_exposure
              dataproduct_idx: [dataproduct_type, dataproduct_subtype]
            spectral_ranges:
              # This list includes every band defined now in registry, actual values for some
              # of them are probably very approximate. Keys in this section coule be a band
              # name of a physical filter name
              "u": [330.0e-9, 400.0e-9]
              "u~nd": [330.0e-9, 400.0e-9]
              "g": [402.0e-9, 552.0e-9]
              "g~nd": [402.0e-9, 552.0e-9]
              "r": [552.0e-9, 691.0e-9]
              "r~nd": [552.0e-9, 691.0e-9]
              "i": [691.0e-9, 818.0e-9]
              "i~nd": [691.0e-9, 818.0e-9]
              "z": [818.0e-9, 922.0e-9]
              "z~nd": [818.0e-9, 922.0e-9]
              "y": [970.0e-9, 1060.0e-9]
              "y~nd": [970.0e-9, 1060.0e-9]
              "white": [null, null]
              "unknown": [null, null]
              "diffuser": [null, null]
              "notch": [null, null]
              "grid": [null, null]
              "grid~nd": [null, null]
              "spot": [null, null]
              "spot~nd": [null, null]
            spatial_plugins:
              pgsphere:
                # adds pgsphere columns and indices
                cls: lsst.daf.butler.registry.obscore.pgsphere.PgSphereObsCorePlugin
                config:
                  region_column: pgs_region         # name of a column for a region/polygon
                  position_column: pgs_center       # name of a column for position/center
            

            Some general comments:

            • spectral_ranges defines every possible band, this is because obscore manager will produce a warning for a band/filter which is not in this list. I should probably remove that warning to simply store NULL for unknown filters.
            • spectral_ranges now defines mapping for band names, if more exact per-physical filter ranges are needed, they can be added there using physical filter name (physical filter name takes precedence over band name there)
            • I guess datalink_url_fmt will be different from DP02
            • we likely need more indices on obscore table, but that needs some feedback from TAP for what sorts of queries we should expect to be popular
            Show
            salnikov Andy Salnikov added a comment - Initially we are going to add obscore to OGA/embargo repository, it's easier to experiment as it has less data, and we can drop and recreate obscore table quickly if we need to improve configuration. Gregory Dubois-Felsmann , here is the first iteration for obscore configuration for that repo based on DP02 configuration above. I annotated it so it should be easier to figure out what changes are needed. # namespace and version are for possible schema migration only, do not affect table contents namespace: embargo version: 1 facility_name: Rubin-LSST obs_collection: LSST.EMBARGO # this likely needs a better name? collection_type: RUN # means we are using all RUN-type collections that match line below collections: ["LATISS/.*", "LSSTCam/.*", "LSSTComCam/.*"] use_butler_uri: false # do not use URI from Butler, use datalink_url_fmt defined below dataset_types: raw: dataproduct_type: image dataproduct_subtype: lsst.raw calib_level: 1 obs_id_fmt: "{records[exposure].obs_id}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits # I guess we need to change this and use something different in place of "dp02"? datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" calexp: dataproduct_type: image dataproduct_subtype: lsst.calexp calib_level: 2 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" deepCoadd_calexp: dataproduct_type: image dataproduct_subtype: lsst.deepCoadd_calexp calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingCoadd: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingCoadd calib_level: 3 obs_id_fmt: "{skymap}-{tract}-{patch}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" goodSeeingDiff_differenceExp: dataproduct_type: image dataproduct_subtype: lsst.goodSeeingDiff_differenceExp calib_level: 3 obs_id_fmt: "{records[visit].name}-{records[detector].full_name}" o_ucd: phot.count access_format: image/fits datalink_url_fmt: "https://data.lsst.cloud/api/datalink/links?ID=butler%3A//dp02/{id}" extra_columns: lsst_visit: template: "{visit}" type: "int" lsst_exposure: template: "{exposure}" type: "int" lsst_detector: template: "{detector}" type: "int" lsst_tract: template: "{tract}" type: "int" lsst_patch: template: "{patch}" type: "int" lsst_band: template: "{band}" type: "string" length: 32 lsst_filter: template: "{physical_filter}" type: "string" length: 32 lsst_dataset_type: template: "{dataset_type}" type: "string" length: 64 lsst_run: template: "{run}" type: "string" length: 255 indices: # Indices for obscore table, spatial columns are indexed automatically. # We likely will need to extend this list to support most popular queries, # would be good to have a list of possible queries generated by TAP. instrument_name_idx: instrument_name lsst_visit_idx: lsst_visit lsst_exposure_idx: lsst_exposure dataproduct_idx: [dataproduct_type, dataproduct_subtype] spectral_ranges: # This list includes every band defined now in registry, actual values for some # of them are probably very approximate. Keys in this section coule be a band # name of a physical filter name "u": [330.0e-9, 400.0e-9] "u~nd": [330.0e-9, 400.0e-9] "g": [402.0e-9, 552.0e-9] "g~nd": [402.0e-9, 552.0e-9] "r": [552.0e-9, 691.0e-9] "r~nd": [552.0e-9, 691.0e-9] "i": [691.0e-9, 818.0e-9] "i~nd": [691.0e-9, 818.0e-9] "z": [818.0e-9, 922.0e-9] "z~nd": [818.0e-9, 922.0e-9] "y": [970.0e-9, 1060.0e-9] "y~nd": [970.0e-9, 1060.0e-9] "white": [null, null] "unknown": [null, null] "diffuser": [null, null] "notch": [null, null] "grid": [null, null] "grid~nd": [null, null] "spot": [null, null] "spot~nd": [null, null] spatial_plugins: pgsphere: # adds pgsphere columns and indices cls: lsst.daf.butler.registry.obscore.pgsphere.PgSphereObsCorePlugin config: region_column: pgs_region # name of a column for a region/polygon position_column: pgs_center # name of a column for position/center Some general comments: spectral_ranges defines every possible band, this is because obscore manager will produce a warning for a band/filter which is not in this list. I should probably remove that warning to simply store NULL for unknown filters. spectral_ranges now defines mapping for band names, if more exact per-physical filter ranges are needed, they can be added there using physical filter name (physical filter name takes precedence over band name there) I guess datalink_url_fmt will be different from DP02 we likely need more indices on obscore table, but that needs some feedback from TAP for what sorts of queries we should expect to be popular
            Show
            ktl Kian-Tat Lim added a comment - https://github.com/slaclab/phalanx/blob/usdfprod/science-platform/values-usdfprod.yaml#L33-L34 will likely need to be changed.
            Hide
            salnikov Andy Salnikov added a comment - - edited

            Kian-Tat Lim, I have no idea what is that and why do I need to change it. This ticket is for defining configuration for registry manager so that we can create and fill obscore table. I guess that something also needs to be done for TAP service, but it should be done on a separate ticket.

            Show
            salnikov Andy Salnikov added a comment - - edited Kian-Tat Lim , I have no idea what is that and why do I need to change it. This ticket is for defining configuration for registry manager so that we can create and fill obscore table. I guess that something also needs to be done for TAP service, but it should be done on a separate ticket.
            Hide
            salnikov Andy Salnikov added a comment -

            Closing this, Gregory added initial version of a config file on DM-37276.

            Show
            salnikov Andy Salnikov added a comment - Closing this, Gregory added initial version of a config file on DM-37276 .

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              salnikov Andy Salnikov
              Watchers:
              Andy Salnikov, Fritz Mueller, Gregory Dubois-Felsmann, Jim Bosch, Kian-Tat Lim, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.