Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-17949

Firefly TAP browser: UI planning for future support of joins

    XMLWordPrintable

Details

    • Story
    • Status: To Do
    • Resolution: Unresolved
    • None
    • Firefly, SUIT
    • None

    Description

      I understand that there's some thinking going on about possible rearrangement of the components of the prototype TAP browser UI, and in this context I heard a remark that the upper-left pane, that is, the schema-and-table selection pane, has some blank space that we could think about reclaiming.

      This is to point out that at some point we'll want to facilitate joins in the UI, based on the foreign-key information in TAP_SCHEMA.

      The currently empty space below the "Tables" pull-down would be exactly the place for this. I imagine that below that pull-down there would be UI elements for selecting a table to join to and the keys to use for the join.

      I've started a sketch for that which I'll upload as an image.

      Once the join is defined, Firefly would have to determine what the joined tables' effective schema would be, either by analyzing the schemas of the two tables and computing the combination itself, or by issuing a MAXREC=0 query for the join against the TAP service. With that in hand, the UI's table-of-columns-and-constraints (the current bottom half) could be populated with the joined schema, and then all its other features (e.g., column selection, constraint definition) would proceed as usual. The spatial-constraint and temporal-constraint search helpers (upper right pane) should also be able to work against the joined schema. This is all pretty well-defined and users would have very natural expectations.

      Getting the joined-table schema itself defined may be the hardest part of this whole task.

      Implementing this is NOT an LSST close-out task (sadly). But we should not give the UI space away without at least taking this into account.

      Attachments

        Issue Links

          Activity

            The sketch I've attached shows a maximalist fantasy, where the UI supports both by-key and spatial ("fuzzy") joins, with that choice under the control of radio buttons.

            1) The by-key joins, which would be easier to implement and my higher priority, would offer tables eligible to be joined to the "primary" table that was selected, based on the content of the TAP_SCHEMA.keys table, which describes which tables have foreign-key links to each other.

            Once a second table is chosen, there will be one or more actual key pairs that can be used for the join, as defined in the TAP_SCHEMA.key_columns table. The final drop-down allows the user to make the choice of these, if there is a choice available.

            (I didn't show this in my sketch, but on reflection the "Upload" function I sketched for the spatial join also makes sense for by-key joins. However, here, without TAP_SCHEMA.key_columns metadata available, the user would have to be offered the choice of any columns in either table to be used to perform the foreign-key join. That would require two key drop-downs, instead of a single key-pair drop-down.)

            2) For spatial joins, the first step would be to define a target table. In this case it will be extremely common for the user to want this to be a transiently uploaded private table, so I showed an upload button.

            Once the two tables have been defined, a fairly complex UI that probably will not fit in this pane would be needed to define the columns in each table that would be used for the spatial ("neighborhood") join. Some of the logic of the existing spatial-constraints helper would be appropriate here, as the job would include identifying columns that are "ra-like" and "dec-like" to be used in the spatial distance calculation.

            It might also be useful to be able to perform one-dimensional "fuzzy joins", e.g., on time.

            All of this second mode requires a lot of careful thought. I would recommend deploying a version of the TAP explorer with the by-key joins first, and then add the spatial joins in a further round of work. Support of joins to user tables in the by-key joins could/should also be deferred.

            gpdf Gregory Dubois-Felsmann added a comment - The sketch I've attached shows a maximalist fantasy, where the UI supports both by-key and spatial ("fuzzy") joins, with that choice under the control of radio buttons. 1) The by-key joins, which would be easier to implement and my higher priority, would offer tables eligible to be joined to the "primary" table that was selected, based on the content of the TAP_SCHEMA.keys table, which describes which tables have foreign-key links to each other. Once a second table is chosen, there will be one or more actual key pairs that can be used for the join, as defined in the TAP_SCHEMA.key_columns table. The final drop-down allows the user to make the choice of these, if there is a choice available. (I didn't show this in my sketch, but on reflection the "Upload" function I sketched for the spatial join also makes sense for by-key joins. However, here, without TAP_SCHEMA.key_columns metadata available, the user would have to be offered the choice of any columns in either table to be used to perform the foreign-key join. That would require two key drop-downs, instead of a single key-pair drop-down.) 2) For spatial joins, the first step would be to define a target table. In this case it will be extremely common for the user to want this to be a transiently uploaded private table, so I showed an upload button. Once the two tables have been defined, a fairly complex UI that probably will not fit in this pane would be needed to define the columns in each table that would be used for the spatial ("neighborhood") join. Some of the logic of the existing spatial-constraints helper would be appropriate here, as the job would include identifying columns that are "ra-like" and "dec-like" to be used in the spatial distance calculation. It might also be useful to be able to perform one-dimensional "fuzzy joins", e.g., on time. All of this second mode requires a lot of careful thought. I would recommend deploying a version of the TAP explorer with the by-key joins first, and then add the spatial joins in a further round of work. Support of joins to user tables in the by-key joins could/should also be deferred.

            This is still valid and is on the agreed work plan for the next 9-12 months.

            gpdf Gregory Dubois-Felsmann added a comment - This is still valid and is on the agreed work plan for the next 9-12 months.

            Support for creation of JOINs in Firefly (RSP Portal Aspect) in the context of building ADQL queries on the "Edit ADQL" screen was added in August 2023 (Firefly release 2023.2.x), under DM-32041 and is in production. We continue to incrementally add more foreign-key metadata to the Felis data models, to take advantage of this capability.

            This ticket remains about the creation of a more user-friendly non-ADQL-expert means of building JOIN queries based on metadata documenting cross-table relationships in the data, and it is still valid. It is queued behind the refresh of the Firefly UX being done in early 2024.

            gpdf Gregory Dubois-Felsmann added a comment - Support for creation of JOINs in Firefly (RSP Portal Aspect) in the context of building ADQL queries on the "Edit ADQL" screen was added in August 2023 (Firefly release 2023.2.x), under DM-32041 and is in production. We continue to incrementally add more foreign-key metadata to the Felis data models, to take advantage of this capability. This ticket remains about the creation of a more user-friendly non-ADQL-expert means of building JOIN queries based on metadata documenting cross-table relationships in the data, and it is still valid. It is queued behind the refresh of the Firefly UX being done in early 2024.

            Note that a related functionality was deployed in the Portal in 2023 as well: an entirely UI-based, no-ADQL-knowledge-needed, ability to JOIN an externally provided table to a system table based on either an equality-join on a key or a spatial-neighbor join on coordinates.

            This is currently only visible on DP0.3 because it depends on the TAP/UWS "UPLOAD" capability which we only have available for Postgres-backed TAP services.

            However, it is an example of a way of providing a JOIN-oriented UI in a way that looks quite different from the sketch on this ticket.

            In the sketch above, the user workflow is:

            1. User picks a table to search (e.g., Object)
            2. Portal tells the user immediately "this table can be joined to other tables on this service" (e.g., ForcedSource, or CoaddPatches)
            3. User chooses "match spatially" vs. "match by object ID"
            4. Portal tells the user what other tables they can join to

            If the approach currently taken for external tables were extended, instead of the above the workflow would be:

            1. User picks a table to search
            2. User navigates to the future version of the "search helpers" (the lower left of the "UI-assisted" TAP search screen)
            3. User sees the existing options, e.g., "spatial", "temporal", "search by ID", and picks the one they want
            4. If they pick "multi-object search" under "spatial" (or - in the future - a "list of times" under "temporal") in addition to the current "upload table" option they would also get an option to "join" (or "cross-match") against a list of other tables already on the service
            5. If they pick "search by ID" they would get an option to join to the list of other tables on the service for which a foreign-key relationship is documented (e.g., Object-to-ForcedSource, Object-to-CoaddPatches, Source-to-CcdVisit).

            I still like the original approach, but there are appealing features to the latter approach as well. I don't think we should do both – that's likely to be very confusing to users.

            What's important to me in this is discoverability – I want users to have a hard time not noticing that this capability exists.

            Before we start working on an implementation, it would be helpful to talk through these alternatives, if not also other ideas, with a few interested parties from the RSP community. (We'll want to do this on the IPAC side, too.)

            gpdf Gregory Dubois-Felsmann added a comment - Note that a related functionality was deployed in the Portal in 2023 as well: an entirely UI-based, no-ADQL-knowledge-needed, ability to JOIN an externally provided table to a system table based on either an equality-join on a key or a spatial-neighbor join on coordinates. This is currently only visible on DP0.3 because it depends on the TAP/UWS "UPLOAD" capability which we only have available for Postgres-backed TAP services. However, it is an example of a way of providing a JOIN-oriented UI in a way that looks quite different from the sketch on this ticket. In the sketch above, the user workflow is: User picks a table to search (e.g., Object) Portal tells the user immediately "this table can be joined to other tables on this service" (e.g., ForcedSource, or CoaddPatches) User chooses "match spatially" vs. "match by object ID" Portal tells the user what other tables they can join to If the approach currently taken for external tables were extended, instead of the above the workflow would be: User picks a table to search User navigates to the future version of the "search helpers" (the lower left of the "UI-assisted" TAP search screen) User sees the existing options, e.g., "spatial", "temporal", "search by ID", and picks the one they want If they pick "multi-object search" under "spatial" (or - in the future - a "list of times" under "temporal") in addition to the current "upload table" option they would also get an option to "join" (or "cross-match") against a list of other tables already on the service If they pick "search by ID" they would get an option to join to the list of other tables on the service for which a foreign-key relationship is documented (e.g., Object-to-ForcedSource, Object-to-CoaddPatches, Source-to-CcdVisit). I still like the original approach, but there are appealing features to the latter approach as well. I don't think we should do both – that's likely to be very confusing to users. What's important to me in this is discoverability – I want users to have a hard time not noticing that this capability exists. Before we start working on an implementation, it would be helpful to talk through these alternatives, if not also other ideas, with a few interested parties from the RSP community. (We'll want to do this on the IPAC side, too.)

            People

              Unassigned Unassigned
              gpdf Gregory Dubois-Felsmann
              Colin Slater, Gregory Dubois-Felsmann, Luisa Rebull, Vandana Desai
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:

                Jenkins

                  No builds found.