Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-2065

FY18 Implement Data Verification Tool

    XMLWordPrintable

    Details

    • Type: Epic
    • Status: To Do
    • Resolution: Unresolved
    • Fix Version/s: None
    • Component/s: Qserv
    • Labels:
      None
    • Epic Name:
      FY18 Qserv Data Verification
    • Story Points:
      90
    • WBS:
      02C.06.02.03
    • Team:
      Data Access and Database

      Description

      Need a tool for verifying whether data is in consistent stage (e.g., right after loading, after some upgrades, in general at any given time).

      The list of things to check include:

      • empty chunk file,
      • xrootd exported DB,
      • data tables
      • overlap tables,
      • data_0123456789 tables
      • chunkId, subChunkId columns existence

      Some of the above can be automatically fixed on the spot when problem is discovered.

        Attachments

          Issue Links

          Stories in Epic (Custom Issue Matrix)

          Key Summary Story Points Assignee Status
           
          DM-1898

          Consistency checking for table data CSS

          4 Unassigned Invalid

            Activity

            Hide
            danielw Daniel Wang [X] (Inactive) added a comment -

            Instead of checking for the empty-chunk files, check to make sure the secondary index is valid, meaning that the secondary index should only point at chunks/subchunks that exist on the workers.

            Eventually, the empty chunk information should be derived from the secondary index (or some other similar way of merging), so empty chunk files, in essence would become caches, and not having them wouldn't make things inconsistent--just recreate the cache.

            Show
            danielw Daniel Wang [X] (Inactive) added a comment - Instead of checking for the empty-chunk files, check to make sure the secondary index is valid, meaning that the secondary index should only point at chunks/subchunks that exist on the workers. Eventually, the empty chunk information should be derived from the secondary index (or some other similar way of merging), so empty chunk files, in essence would become caches, and not having them wouldn't make things inconsistent--just recreate the cache.
            Hide
            fritzm Fritz Mueller added a comment -

            A cluster-wide "integrity and cleanup" check (think "fsck for Qserv") would still be useful, and could potentially be implemented on top of the replication system infrastructure

            Show
            fritzm Fritz Mueller added a comment - A cluster-wide "integrity and cleanup" check (think "fsck for Qserv") would still be useful, and could potentially be implemented on top of the replication system infrastructure

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              fritzm Fritz Mueller
              Watchers:
              Fritz Mueller
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:

                  Jenkins

                  No builds found.