I decided that dragging ORDER BY columns from a sub-query to a parent query (or materialized query) is too much effort, so instead I add a computed column to the result of a query which contains rank of a record. That rank is calculated using SQL window function as row_number() OVER (ORDER BY columns). It is much easier to handle a single column with a fixed name. The query that we generate for queryDimensionRecords will now look like:
SELECT
|
main_20210215.exposure.instrument,
|
main_20210215.exposure.id,
|
-- ... more exposure columns
|
main_20210215.exposure.zenith_angle,
|
main_20210215.exposure.timespan,
|
c._orderby AS _orderby
|
FROM
|
main_20210215.exposure
|
JOIN (
|
SELECT
|
main_20210215.physical_filter.band AS band,
|
main_20210215.physical_filter.instrument AS instrument,
|
main_20210215.physical_filter.name AS physical_filter,
|
main_20210215.exposure.id AS exposure,
|
row_number() OVER (ORDER BY main_20210215.exposure.day_obs ASC) AS _orderby -- this is the new column
|
FROM
|
(
|
SELECT
|
main_20210215.dataset_tags_00000002.instrument AS instrument,
|
main_20210215.dataset_tags_00000002.detector AS detector,
|
main_20210215.dataset_tags_00000002.exposure AS exposure
|
FROM
|
main_20210215.dataset_tags_00000002
|
WHERE
|
main_20210215.dataset_tags_00000002.dataset_type_id = %(dataset_type_id_1)s
|
AND main_20210215.dataset_tags_00000002.collection_name = %(collection_name_1)s
|
) AS raw
|
JOIN main_20210215.exposure ON raw.instrument = main_20210215.exposure.instrument
|
AND raw.exposure = main_20210215.exposure.id
|
JOIN main_20210215.physical_filter ON raw.instrument = main_20210215.physical_filter.instrument
|
AND main_20210215.exposure.instrument = main_20210215.physical_filter.instrument
|
AND main_20210215.exposure.physical_filter = main_20210215.physical_filter.name
|
WHERE
|
main_20210215.exposure.day_obs > %(param_1)s
|
AND raw.instrument = %(instrument_1)s
|
AND main_20210215.exposure.instrument = %(instrument_2)s
|
AND main_20210215.physical_filter.instrument = %(instrument_3)s
|
ORDER BY
|
_orderby
|
) AS c ON main_20210215.exposure.instrument = c.instrument
|
AND main_20210215.exposure.id = c.exposure
|
ORDER BY
|
_orderby
|
New column _orderby is used for ordering in both sub-query and outer query. It is not strictly needed in a sub-query if LIMIT is not used, but I still keep it there (my idea that in many cases when ORDER BY is used LIMIT will also be present).
This only works correctly if the is only one sub-query (no UNIONs or joins with other JOINs) which I believe is true for queryDimensionRecords().
Ignoring the order_by and limit parameters in https://github.com/lsst/daf_butler/blob/main/python/lsst/daf/butler/registry/queries/_results.py#L159 looks suspicious but maybe not the proximate cause of this.