Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-10554

Calculate impact of proposed DMS-REQ-0350

    Details

    • Team:
      Architecture

      Description

      In DM-4597 we agreed on a new DM requirement that each data release should include a cross-match table mapping Objects in the current data release to each of the previous data releases. This has now been included in LCR-962 and a request has been made to estimate the impact on staff effort in construction, and the sizing model.

        Attachments

          Issue Links

            Activity

            Hide
            ktl Kian-Tat Lim added a comment -

            I'd prefer if we baselined as simple a matching algorithm as possible. In that case, the additional storage would be at most storing a subset of the columns of the Object tables from previous DRs (if they're not already available) as well as the storage for the match tables themselves, which should be no more than a few TB (worst case I can think of is 10 DRs, 3 potential matches, 8 byte ids and 4 byte match probabilities, 40G objects = 14 TB).

            Show
            ktl Kian-Tat Lim added a comment - I'd prefer if we baselined as simple a matching algorithm as possible. In that case, the additional storage would be at most storing a subset of the columns of the Object tables from previous DRs (if they're not already available) as well as the storage for the match tables themselves, which should be no more than a few TB (worst case I can think of is 10 DRs, 3 potential matches, 8 byte ids and 4 byte match probabilities, 40G objects = 14 TB).
            Hide
            tjenness Tim Jenness added a comment -

            So a few hundred dollars in storage.

            My main concern is that each DR will be deeper than the one before with more sources that will pop up. It seems that this cross matcher must, at minimum, include the flux in the cross match so that we don't make the obvious mistake of matching a bright star in DR1 that has some small proper motion with a faint galaxy in DR11 that only appeared in that data release. I don't think a truly naive cross matcher will be useful. John Swinbank does DRP team have an opinion on this?

            Show
            tjenness Tim Jenness added a comment - So a few hundred dollars in storage. My main concern is that each DR will be deeper than the one before with more sources that will pop up. It seems that this cross matcher must, at minimum, include the flux in the cross match so that we don't make the obvious mistake of matching a bright star in DR1 that has some small proper motion with a faint galaxy in DR11 that only appeared in that data release. I don't think a truly naive cross matcher will be useful. John Swinbank does DRP team have an opinion on this?
            Hide
            tjenness Tim Jenness added a comment -

            That comment triggered a memory. I saw a talk at ADASS last year from Gaia where they used a classifier with their cross matcher (including the light curves) and got much more accurate matches.

            http://www.adass2016.inaf.it/index.php/participant-list/14-talk/135-rimoldini-lorenzo

            Wil O'Mullane do you know whether that talk was written up?

            Show
            tjenness Tim Jenness added a comment - That comment triggered a memory. I saw a talk at ADASS last year from Gaia where they used a classifier with their cross matcher (including the light curves) and got much more accurate matches. http://www.adass2016.inaf.it/index.php/participant-list/14-talk/135-rimoldini-lorenzo Wil O'Mullane do you know whether that talk was written up?
            Hide
            womullan Wil O'Mullane added a comment - - edited

            Concerning the talk - there are reports concerning the topic in the Gaia livelink(Docushare) though not that specific title.
            The short ADASS write up is available https://arxiv.org/pdf/1702.04165.pdf
            I can put you/or someone in touch with Laurent Eyer (CU7 leader) for more details I do not know Rimoldini directly. (Though his email is in the paper)

            Show
            womullan Wil O'Mullane added a comment - - edited Concerning the talk - there are reports concerning the topic in the Gaia livelink(Docushare) though not that specific title. The short ADASS write up is available https://arxiv.org/pdf/1702.04165.pdf I can put you/or someone in touch with Laurent Eyer (CU7 leader) for more details I do not know Rimoldini directly. (Though his email is in the paper)
            Hide
            tjenness Tim Jenness added a comment -

            Pinging John Swinbank for DRP opinion on the cross-matching. For LCR-962 we need an estimate of the impact of this requirement on DRP team.

            Kian-Tat Lim who puts this information in the sizing model? I assume for LCR-962 we declare that this has no impact on hardware budget.

            Show
            tjenness Tim Jenness added a comment - Pinging John Swinbank for DRP opinion on the cross-matching. For LCR-962 we need an estimate of the impact of this requirement on DRP team. Kian-Tat Lim who puts this information in the sizing model? I assume for LCR-962 we declare that this has no impact on hardware budget.
            Hide
            swinbank John Swinbank added a comment -

            I don't think there's a significant new algorithmic component here — we can reuse the matching algorithms we have to develop anyway for Object generation, which will be sensitive to position, flux, etc etc. There will be some extra effort to integrate those algorithms into a tool which can actually do the job, but that should be relatively modest.

            Show
            swinbank John Swinbank added a comment - I don't think there's a significant new algorithmic component here — we can reuse the matching algorithms we have to develop anyway for Object generation, which will be sensitive to position, flux, etc etc. There will be some extra effort to integrate those algorithms into a tool which can actually do the job, but that should be relatively modest.
            Hide
            tjenness Tim Jenness added a comment -

            I'm happy with that if our matcher is matching flux. I'd be happier if it matched the light curves (with the caveat that matching a DR10 to DR11 source will be much more accurate because of the light curve overlap than would a DR1 to DR11 match).

            Show
            tjenness Tim Jenness added a comment - I'm happy with that if our matcher is matching flux. I'd be happier if it matched the light curves (with the caveat that matching a DR10 to DR11 source will be much more accurate because of the light curve overlap than would a DR1 to DR11 match).
            Hide
            tjenness Tim Jenness added a comment -

            Closing this. LCR-962 is now up for a vote.

            Show
            tjenness Tim Jenness added a comment - Closing this. LCR-962 is now up for a vote.

              People

              • Assignee:
                ktl Kian-Tat Lim
                Reporter:
                tjenness Tim Jenness
                Watchers:
                John Swinbank, Kian-Tat Lim, Tim Jenness, Wil O'Mullane
              • Votes:
                0 Vote for this issue
                Watchers:
                4 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel