Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-33490

Implement record import and export methods on concrete Datastores

    XMLWordPrintable

    Details

    • Story Points:
      3
    • Sprint:
      DB_S22_12
    • Team:
      Data Access and Database
    • Urgent?:
      No

      Description

      Unfinished business from DM-32072: Datastore now has import_records and export_records methods with do-nothing implementations. These should be implemented to actually do something for any concrete Datastore that has internal records, and the base class implementation should probably be made abstract, since that's the usual case.

        Attachments

          Issue Links

            Activity

            Hide
            tjenness Tim Jenness added a comment -

            The API I use in FileDatastore is _get_stored_records_associated_with_refs which returns `Dict[dataset_id, List[StoredFileInfo]]` (since a dataset ID can refer to multiple records). Are you using that API to get all the relevant records?

            I haven't needed the datasetId so didn't think about it before. An optional dataset_id passed to StoredFileInfo,from_record() seems like it would not cause any problems.

            Show
            tjenness Tim Jenness added a comment - The API I use in FileDatastore is _get_stored_records_associated_with_refs which returns `Dict[dataset_id, List [StoredFileInfo] ]` (since a dataset ID can refer to multiple records). Are you using that API to get all the relevant records? I haven't needed the datasetId so didn't think about it before. An optional dataset_id passed to StoredFileInfo,from_record() seems like it would not cause any problems.
            Hide
            salnikov Andy Salnikov added a comment -

            That is not quite about FileDatastore API but about what goes into the DatastoreRecordData structure. If dataset_id is not a part of StoredFileInfo then I'm forced to define it something like:

            class DatastoreRecordData:
                """Per-datastore structure containing its exported records"""
                refs: List[DatasetRef] = []    # DatasetRefs known to a datastore
             
                records: Dict[str, Dict[DatasetId, List[StoredDatastoreItemInfo]]]  # records grouped by table name and dataset Id
            

            Which looks a bit complicated but also not super-generic, e.g. it forces explicit association of records with refs, and I'm worried that this could potentially break if there is an implementation later that use records that don't associate with refs (or associate with multiple refs).

            I think having records definition like this:

                records: Dict[str, List[StoredDatastoreItemInfo]]  # records grouped by table name
            

            could make it more abstract, but it needs that StoredDatastoreItemInfo instances include dataset_id when they need it.

            And an alternative could be to go back to simple dicts:

                records: Dict[str, List[Dict[str, Any]]]  # records grouped by table name
            

            as in Jim's original code.

            Show
            salnikov Andy Salnikov added a comment - That is not quite about FileDatastore API but about what goes into the DatastoreRecordData structure. If dataset_id is not a part of StoredFileInfo then I'm forced to define it something like: class DatastoreRecordData: """Per-datastore structure containing its exported records""" refs: List[DatasetRef] = [] # DatasetRefs known to a datastore   records: Dict[str, Dict[DatasetId, List[StoredDatastoreItemInfo]]] # records grouped by table name and dataset Id Which looks a bit complicated but also not super-generic, e.g. it forces explicit association of records with refs, and I'm worried that this could potentially break if there is an implementation later that use records that don't associate with refs (or associate with multiple refs). I think having records definition like this: records: Dict[str, List[StoredDatastoreItemInfo]] # records grouped by table name could make it more abstract, but it needs that StoredDatastoreItemInfo instances include dataset_id when they need it. And an alternative could be to go back to simple dicts: records: Dict[str, List[Dict[str, Any]]] # records grouped by table name as in Jim's original code.
            Hide
            tjenness Tim Jenness added a comment -

            I did say that it wouldn't seem to be a problem to have an optional dataset ID in the StoredFileInfo. Go for it.

            Show
            tjenness Tim Jenness added a comment - I did say that it wouldn't seem to be a problem to have an optional dataset ID in the StoredFileInfo. Go for it.
            Hide
            salnikov Andy Salnikov added a comment -

            Jim Bosch, I think it's ready for another round. DatastoreRecordData has changed as we discussed, this also needed changes to Quantum class.

            Tim Jenness, could you look at the StoredFileInfo class? I do not particularly like that to_record() method still has DatasetRef argument, feels confusing, as DatasetId is already in StoredFileInfo. Should we drop that argument?

            Show
            salnikov Andy Salnikov added a comment - Jim Bosch , I think it's ready for another round. DatastoreRecordData has changed as we discussed, this also needed changes to Quantum class. Tim Jenness , could you look at the StoredFileInfo class? I do not particularly like that to_record() method still has DatasetRef argument, feels confusing, as DatasetId is already in StoredFileInfo. Should we drop that argument?
            Hide
            salnikov Andy Salnikov added a comment -

            Jenkins succeeded, so finally merged.

            Show
            salnikov Andy Salnikov added a comment - Jenkins succeeded, so finally merged.

              People

              Assignee:
              salnikov Andy Salnikov
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Jim Bosch
              Watchers:
              Andy Salnikov, Jim Bosch, Tim Jenness
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.