Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-417

finish adding aliases to afw::table::Schema

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: afw
    • Labels:

      Description

      This issue picks up the partially-completed Trac Ticket #2351, which adds a string-substitution-based alias mechanism to afw::table::Schema. There are still some issues to sort out w.r.t. constness - in other respects, a Table or Catalog's schema cannot be changed once the Table has been constructed, but we do need to be able to change aliases after table construction.

        Attachments

          Issue Links

            Activity

            Hide
            jbosch Jim Bosch added a comment -

            Because no good deed ever goes unpunished, and Russell already discussed this some with me, I'm asking him to review it.

            This adds a new class, AliasMap, to afw::table. An AliasMap is held by a Schema, where it's used when field names are looked up. Aliases are simple string-to-string mappings that are also searched when a field is searched for, and can match not just the full field name but an incomplete prefix as well. There's also code here to notify owning Tables when an alias changes (this is in anticipation of DM-419, in which we'll use these aliases to define slots), and to persist the aliases with the Schema.

            Changes are all in afw u/jbosch/DM-417:

            afw:u/jbosch/DM-417 % git diff --stat master...u/jbosch/DM-417
             .gitignore                                 |    1 +
             include/lsst/afw/fits.h                    |    4 +-
             include/lsst/afw/table/AliasMap.h          |   98 +++++++++++++++++++
             include/lsst/afw/table/BaseTable.h         |    7 +-
             include/lsst/afw/table/Schema.h            |   48 ++++++++--
             include/lsst/afw/table/detail/SchemaImpl.h |    4 +-
             python/lsst/afw/table/Base.i               |   40 ++++++++
             src/table/AliasMap.cc                      |   86 +++++++++++++++++
             src/table/BaseTable.cc                     |    6 ++
             src/table/Schema.cc                        |   40 ++++++--
             src/table/io/FitsReader.cc                 |   20 ++++
             src/table/io/FitsWriter.cc                 |    7 ++
             tests/testSimpleTable.py                   |    3 +-
             tests/testTableAliases.cc                  |   86 +++++++++++++++++
             tests/testTableAliases.py                  |  143 ++++++++++++++++++++++++++++
             15 files changed, 570 insertions(+), 23 deletions(-)

            Show
            jbosch Jim Bosch added a comment - Because no good deed ever goes unpunished, and Russell already discussed this some with me, I'm asking him to review it. This adds a new class, AliasMap, to afw::table. An AliasMap is held by a Schema, where it's used when field names are looked up. Aliases are simple string-to-string mappings that are also searched when a field is searched for, and can match not just the full field name but an incomplete prefix as well. There's also code here to notify owning Tables when an alias changes (this is in anticipation of DM-419 , in which we'll use these aliases to define slots), and to persist the aliases with the Schema. Changes are all in afw u/jbosch/ DM-417 : afw:u/jbosch/DM-417 % git diff --stat master...u/jbosch/DM-417 .gitignore | 1 + include/lsst/afw/fits.h | 4 +- include/lsst/afw/table/AliasMap.h | 98 +++++++++++++++++++ include/lsst/afw/table/BaseTable.h | 7 +- include/lsst/afw/table/Schema.h | 48 ++++++++-- include/lsst/afw/table/detail/SchemaImpl.h | 4 +- python/lsst/afw/table/Base.i | 40 ++++++++ src/table/AliasMap.cc | 86 +++++++++++++++++ src/table/BaseTable.cc | 6 ++ src/table/Schema.cc | 40 ++++++-- src/table/io/FitsReader.cc | 20 ++++ src/table/io/FitsWriter.cc | 7 ++ tests/testSimpleTable.py | 3 +- tests/testTableAliases.cc | 86 +++++++++++++++++ tests/testTableAliases.py | 143 ++++++++++++++++++++++++++++ 15 files changed, 570 insertions(+), 23 deletions(-)
            Hide
            rowen Russell Owen added a comment - - edited

            DM-417 review

            I cannot comment on the utility of the changes, since I don't have a very good understanding of what problem aliases are intended to solve now. Your comments do say that eventually they will be used to define slots, which sounds promising. Your comments refer to DM-419, but that issue unfortunately contains no description so the link isn't yet informative.

            On the whole the code looks very nice, but I would like to see some documentation enhancements. Specifics:

            Schema:

            • Schema.find: I'm unhappy that you have changed it to mutate the name. How often will the user actually care what the expanded name is? If not often, my first inclination would be to force them to call something else to get the expanded name. But if you think a caller often needs to know it, or it would not be reliable to have to expand it separately, then I strongly suggest that you provide the original const-name version find(std::string const &name) _plus_ either of two variant find methods:
            • find(std::string &expandedName, std::string const &name): expanded name is put in an output string; this is my preferred choice (even though I am not keen on output arguments) because one can use a const name to find, it makes it clear the caller really wants the expanded name, and there is less chance of surprising the user
            • find(std::string &name): name is expanded in place (as you have it now); simple but potentially rudely surprising
            • Schema.getAliasMap returns a map that that is a shared pointer to the version inside Schema; modifying the returned value modifies the one inside the Schema. Exposing internals like that worries me, but I'm betting you had good reason to do it this way, such as sharing alias maps between schemas? (I admit I'm not sure why that's useful, though it sounds interesting). But just in case it is practical, I'll mention an obvious safer-but-clumsier alternative: return a copy, which the user can modify as they like and then call Schema.setAliasMap to make changes.

            AliasMap:

            • What does AliasMap.apply do? I found the description too vague. My guess is that it's like AliasMap.get but expands abbreviations. However, it also apparently modifies the input argument; if so, this seems a misfeature, especially since it returns the result. Please document what happens if there is no match.
            • In the same vein, please document AliasMap::_apply in AliasMap.cc. Why does it modify the input argument instead of returning a new string? What does it do if there is no match? Also, as a nit-pick, I think the code would easier to read if it used boost::string::starts_with.
            • What does AliasMap.get do if there is no match?
            • What does AliasMap.remove do if there is no match?
            • AliasMap.cc defines anonymous function startsWith but does not use it. Also, why define a new one instead of using boost::string::starts_with instead?

            Other:

            • testTableAliases.py: nit-pick: sys, os and numpy are imported but not used (let's hear it for pyflakes).
            Show
            rowen Russell Owen added a comment - - edited DM-417 review I cannot comment on the utility of the changes, since I don't have a very good understanding of what problem aliases are intended to solve now. Your comments do say that eventually they will be used to define slots, which sounds promising. Your comments refer to DM-419 , but that issue unfortunately contains no description so the link isn't yet informative. On the whole the code looks very nice, but I would like to see some documentation enhancements. Specifics: Schema: Schema.find : I'm unhappy that you have changed it to mutate the name. How often will the user actually care what the expanded name is? If not often, my first inclination would be to force them to call something else to get the expanded name. But if you think a caller often needs to know it, or it would not be reliable to have to expand it separately, then I strongly suggest that you provide the original const-name version find(std::string const &name) _ plus _ either of two variant find methods: find(std::string &expandedName, std::string const &name) : expanded name is put in an output string; this is my preferred choice (even though I am not keen on output arguments) because one can use a const name to find, it makes it clear the caller really wants the expanded name, and there is less chance of surprising the user find(std::string &name) : name is expanded in place (as you have it now); simple but potentially rudely surprising Schema.getAliasMap returns a map that that is a shared pointer to the version inside Schema; modifying the returned value modifies the one inside the Schema. Exposing internals like that worries me, but I'm betting you had good reason to do it this way, such as sharing alias maps between schemas? (I admit I'm not sure why that's useful, though it sounds interesting). But just in case it is practical, I'll mention an obvious safer-but-clumsier alternative: return a copy, which the user can modify as they like and then call Schema.setAliasMap to make changes. AliasMap: What does AliasMap.apply do? I found the description too vague. My guess is that it's like AliasMap.get but expands abbreviations. However, it also apparently modifies the input argument; if so, this seems a misfeature, especially since it returns the result. Please document what happens if there is no match. In the same vein, please document AliasMap::_apply in AliasMap.cc . Why does it modify the input argument instead of returning a new string? What does it do if there is no match? Also, as a nit-pick, I think the code would easier to read if it used boost::string::starts_with. What does AliasMap.get do if there is no match? What does AliasMap.remove do if there is no match? AliasMap.cc defines anonymous function startsWith but does not use it. Also, why define a new one instead of using boost::string::starts_with instead? Other: testTableAliases.py : nit-pick: sys , os and numpy are imported but not used (let's hear it for pyflakes).
            Hide
            jbosch Jim Bosch added a comment -

            Thanks for the very quick review!

            Schema.find: I'm unhappy that you have changed it to mutate the name...

            Actually, I haven't: because I'm passing the string by value, not by reference, I'm actually modifying a temporary. Doing it this way instead of passing by const reference and creating an explicit temporary just gives the compiler more options how how to optimize the function. It's a fine line between premature optimization and good (but perhaps confusingly advanced) C++ practice, so if you'd prefer I use const reference and an explicit temporary I'd be happy to make the switch. This goes for AliasMap::apply() as well.

            Schema.getAliasMap returns a map that that is a shared pointer to the version inside Schema; modifying the returned value modifies the one inside the Schema. Exposing internals like that worries me, but I'm betting you had good reason to do it this way, such as sharing alias maps between schemas? (I admit I'm not sure why that's useful, though it sounds interesting). But just in case it is practical, I'll mention an obvious safer-but-clumsier alternative: return a copy, which the user can modify as they like and then call Schema.setAliasMap to make changes.

            This is the ugliness I settled on after our conversation yesterday; I think it was either this, something uglier, or a major refactor. While the the fact that Schemas share aliases may be confusing in some (hopefully rare) cases, it's what we want 99% of the time, because copying two Schemas usually just happens because you've called e.g. catalog.getSchema(), and that returns a copy. So this lets us modify the AliasMap through that copy, which is what we want.

            What does AliasMap.apply do? I found the description too vague. My guess is that it's like AliasMap.get but expands abbreviations. However, it also apparently modifies the input argument; if so, this seems a misfeature, especially since it returns the result. Please document what happens if there is no match.

            You guessed right about what it does, and I'll try to improve the docs to make that more obvious (now that I think about it, it really needs an example). As with Schema::find, as discussed above, it doesn't actually modify the input argument (well, the private _apply() implementation does, but the public one does not).

            What does AliasMap.get do if there is no match?

            Throws NotFoundError. Will document.

            What does AliasMap.remove do if there is no match?

            Hmm; I think it does nothing. I'll check that, and make sure the C++ version is consistent with STL semantics and the Python "del" overload is consistent with what's expected there (an exception, I think). And I'll document it.

            AliasMap.cc defines anonymous function startsWith but does not use it. Also, why define a new one instead of using boost::string::starts_with instead?

            Oops, that's a relic that should be deleted. I didn't know about boost::string::starts_with (thanks!) when I added it, but then it turned out that the logic there was too intertwined with the search algorithm to separate.

            testTableAliases.py: nit-pick: sys, os and numpy are imported but not used (let's hear it for pyflakes).

            I'll remove them.

            Show
            jbosch Jim Bosch added a comment - Thanks for the very quick review! Schema.find: I'm unhappy that you have changed it to mutate the name... Actually, I haven't: because I'm passing the string by value, not by reference, I'm actually modifying a temporary. Doing it this way instead of passing by const reference and creating an explicit temporary just gives the compiler more options how how to optimize the function. It's a fine line between premature optimization and good (but perhaps confusingly advanced) C++ practice, so if you'd prefer I use const reference and an explicit temporary I'd be happy to make the switch. This goes for AliasMap::apply() as well. Schema.getAliasMap returns a map that that is a shared pointer to the version inside Schema; modifying the returned value modifies the one inside the Schema. Exposing internals like that worries me, but I'm betting you had good reason to do it this way, such as sharing alias maps between schemas? (I admit I'm not sure why that's useful, though it sounds interesting). But just in case it is practical, I'll mention an obvious safer-but-clumsier alternative: return a copy, which the user can modify as they like and then call Schema.setAliasMap to make changes. This is the ugliness I settled on after our conversation yesterday; I think it was either this, something uglier, or a major refactor. While the the fact that Schemas share aliases may be confusing in some (hopefully rare) cases, it's what we want 99% of the time, because copying two Schemas usually just happens because you've called e.g. catalog.getSchema(), and that returns a copy. So this lets us modify the AliasMap through that copy, which is what we want. What does AliasMap.apply do? I found the description too vague. My guess is that it's like AliasMap.get but expands abbreviations. However, it also apparently modifies the input argument; if so, this seems a misfeature, especially since it returns the result. Please document what happens if there is no match. You guessed right about what it does, and I'll try to improve the docs to make that more obvious (now that I think about it, it really needs an example). As with Schema::find , as discussed above, it doesn't actually modify the input argument (well, the private _apply() implementation does, but the public one does not). What does AliasMap.get do if there is no match? Throws NotFoundError. Will document. What does AliasMap.remove do if there is no match? Hmm; I think it does nothing. I'll check that, and make sure the C++ version is consistent with STL semantics and the Python "del" overload is consistent with what's expected there (an exception, I think). And I'll document it. AliasMap.cc defines anonymous function startsWith but does not use it. Also, why define a new one instead of using boost::string::starts_with instead? Oops, that's a relic that should be deleted. I didn't know about boost::string::starts_with (thanks!) when I added it, but then it turned out that the logic there was too intertwined with the search algorithm to separate. testTableAliases.py: nit-pick: sys, os and numpy are imported but not used (let's hear it for pyflakes). I'll remove them.
            Hide
            rowen Russell Owen added a comment -

            I should have realized that the std::string name arguments were passed by value. I assume _apply mutates its argument as an efficiency?

            I'll leave it to you. If you call these functions often enough then the potential efficiency increase may be worth the confusion. Otherwise I'd switch. If you keep it then a brief note might be in order (depending on how often you use this pattern and how obvious you think it'll be to others that you are doing this on purpose, and what that purpose is).

            Show
            rowen Russell Owen added a comment - I should have realized that the std::string name arguments were passed by value. I assume _apply mutates its argument as an efficiency? I'll leave it to you. If you call these functions often enough then the potential efficiency increase may be worth the confusion. Otherwise I'd switch. If you keep it then a brief note might be in order (depending on how often you use this pattern and how obvious you think it'll be to others that you are doing this on purpose, and what that purpose is).
            Hide
            jbosch Jim Bosch added a comment -

            Merged to master. I did switch to const reference in Schema::find() and AliasMap::apply(); those weren't likely to be performance critical methods anyway.

            Show
            jbosch Jim Bosch added a comment - Merged to master. I did switch to const reference in Schema::find() and AliasMap::apply(); those weren't likely to be performance critical methods anyway.

              People

              Assignee:
              jbosch Jim Bosch
              Reporter:
              jbosch Jim Bosch
              Reviewers:
              Russell Owen
              Watchers:
              Jim Bosch, Russell Owen
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.