# Handling time in Gen3 middleware

XMLWordPrintable

## Details

• Type: RFC
• Status: Implemented
• Resolution: Done
• Component/s:
• Labels:
None

## Description

The Gen3 middleware will enable the use of time in the query/selector used for quantum graph generation.  This requires times to be present in the registry database and means to be provided for users to specify times.

There are four obvious candidates for the representation of times in the registry database:

• Database internal DATETIME/TIMESTAMP fields, typically required to be UTC.  Allows use of database functions for handling times.
• astropy.time internal format composed of two double-precision floating-point numbers representing an MJD in either UTC or TAI timescales.
• A single double-precision floating-point number representing an MJD, in UTC or TAI.
• lsst.daf.base.DateTime internal format composed of a single 64-bit integer counting nanoseconds since the Unix epoch in either UTC or TAI timescales.

Similarly, there are three candidates for human-specified time literals:

• ISO8601 string (YYYY-MM-DDTHH:mm:ss.ssssssZ).
• An MJD numeric literal, possibly with a prefix or suffix to indicate the type.
• A nanosecond numeric literal, again possibly with a prefix/suffix.

There are implementation desires for the time literal to be identifiable as such to the parser and for all time literals to be translated into a single internal representation within the quantum graph generator.

I think it is also desirable for the quantum graph selector expression to be close to ADQL/SQL rather than inventing a new language.  Our requirements say that user-facing times do not need to be TAI, but pipeline-internal times should be TAI.

Note that the database representation can differ between the registry database and any metadata database published to the DAC and science users, although we may be able to reduce duplication by having those two be implemented using a single underlying set of database tables.

I propose that the human-specified literals be either ISO8601 with explicit timezone indcator or MJD (single-column) numeric, both in the UTC timescale.  If a type indicator is necessary for the latter, prefixing the number with the literal string "MJD" could be acceptable, though it deviates from SQL/ADQL.  I also expect these literal forms to be used in FITS headers where needed.  Literal nanoseconds is too user-hostile to be supported.

As long as the registry database is non-public, I propose that the database representation be either DB-native DATETIME in UTC or lsst.daf.base.DateTime integer nanoseconds, assuming that the DateTime class can be used and new time-handling/translation code is not needed.  The schema creator should choose between native and nanoseconds depending on the expected usage of the column.

Since DateTime provides conversions to and from UTC and MJD, it can be used as the internal representation within the parser and quantum graph generator.

## Activity

Hide
Tim Jenness added a comment -

If we are using nanoseconds internally I would strongly prefer it if doing this did not require we add a daf_base dependency on daf_butler.

Show
Tim Jenness added a comment - If we are using nanoseconds internally I would strongly prefer it if doing this did not require we add a daf_base dependency on daf_butler.
Hide
Russell Owen added a comment - - edited

Note that the EFD will use TAI. I would argue that using TAI everywhere is best for consistency and fewer headaches. FITS supports it and indeed I thought we were writing TAI dates to our FITS files (in ISO format).

I have no opinion about the internal format. Our DDS messages represent TAI as Unix seconds in a double, for what that is worth. That mildly suggests that higher precision formats may not be very helpful.

Show
Russell Owen added a comment - - edited Note that the EFD will use TAI. I would argue that using TAI everywhere is best for consistency and fewer headaches. FITS supports it and indeed I thought we were writing TAI dates to our FITS files (in ISO format). I have no opinion about the internal format. Our DDS messages represent TAI as Unix seconds in a double, for what that is worth. That mildly suggests that higher precision formats may not be very helpful.
Hide
Andy Salnikov added a comment -

I'm not sure that we will ever need to support time duration or intervals natively in the expression parser, but it may be beneficial to think about this possibility too.

Show
Andy Salnikov added a comment - I'm not sure that we will ever need to support time duration or intervals natively in the expression parser, but it may be beneficial to think about this possibility too.
Hide
Colin Slater added a comment -

MJD (single-column) numeric, both in the UTC timescale.

I would support as a general principle that any user-facing, continuous-looking float or integer (MJD, seconds-since-epoch, etc) should be TAI whenever possible, and that UTC only appear when necessary in ways that look like civil datetimes (e.g. ISO-8601 or a SQL timestamp). UTC MJD is both definitionally not valid (I'd argue) and going to cause confusion against the much more scientifically important TAI-based MJD.

Show
Colin Slater added a comment - MJD (single-column) numeric, both in the UTC timescale. I would support as a general principle that any user-facing, continuous-looking float or integer (MJD, seconds-since-epoch, etc) should be TAI whenever possible, and that UTC only appear when necessary in ways that look like civil datetimes (e.g. ISO-8601 or a SQL timestamp). UTC MJD is both definitionally not valid (I'd argue) and going to cause confusion against the much more scientifically important TAI-based MJD.
Hide
Eric Bellm added a comment -

I will note for completeness here that the times we use for computing timeseries features-in particular for period folding-will need to be barycentered TDB, and so the TDB times will be implicitly user-facing as well. I will mention some options for doing so in DMTN-118.

I am fine with TAI as an internal representation.

Show
Eric Bellm added a comment - I will note for completeness here that the times we use for computing timeseries features- in particular for period folding -will need to be barycentered TDB, and so the TDB times will be implicitly user-facing as well. I will mention some options for doing so in DMTN-118. I am fine with TAI as an internal representation.
Hide
John Parejko added a comment -

Adding another vote for the TAI train: I think that both user facing and internal representations in TAI will prevent many headaches over the course of the survey. Relatedly, option 1 says:

Database internal DATETIME/TIMESTAMP fields, typically required to be UTC.

Do databases really require UTC, or do they just require something without a timezone, such that we could safely put in TAI? How do databases typically handle leap seconds and conversions to/from TAI?

Show
John Parejko added a comment - Adding another vote for the TAI train: I think that both user facing and internal representations in TAI will prevent many headaches over the course of the survey. Relatedly, option 1 says: Database internal DATETIME/TIMESTAMP fields, typically required to be UTC. Do databases really require UTC, or do they just require something without a timezone, such that we could safely put in TAI? How do databases typically handle leap seconds and conversions to/from TAI?
Hide
Jim Bosch added a comment -

I have been thinking lately that we might want to start using sphgeom's nice RangeSet class (used for ranges of spherical pixelization integers) to represent sets of timespans (with timestamps as integers, probably) in some places.  That's a very slight point in favor of using integers or something exactly round-trippable through integers, but this is probably only a relevant argument if you're desperate for a tiebreaker.

Show
Jim Bosch added a comment - I have been thinking lately that we might want to start using sphgeom's nice RangeSet class (used for ranges of spherical pixelization integers) to represent sets of timespans (with timestamps as integers, probably) in some places.  That's a very slight point in favor of using integers or something exactly round-trippable through integers, but this is probably only a relevant argument if you're desperate for a tiebreaker.
Hide
Kian-Tat Lim added a comment - - edited

After discussion with Wil, here is (my characterization of) our take on this. Apologies for the long delay.

Basically, we agree with Andy Salnikov.

First, the internal representation in a database is most useful when it is a single field. Multiple fields generally require either error-prone in-query assembly or use of a user-defined function. Having the field be a database-native DATETIME/TIMESTAMP is attractive as it provides access to native functions, but the difference between UTC and TAI may cause those functions (e.g. number of seconds in an interval) to be subtly incorrect. We think the safest choice for internal representation is 64-bit integer nanoseconds in the TAI timescale, as implemented in lsst.daf.base.DateTime. If dependencies on this package are undesirable, that C++ class and its Python wrapping could be extracted. It would also be possible to write a pure-Python implementation, perhaps using astropy.time for MJD and other approximate conversions but not for fundamental operations where it would introduce errors (because 1 ns is not exactly expressible as a fractional day). An additional, derived native DATETIME column for other uses is acceptable; it should follow the database-native conventions, virtually always meaning it should be UTC. John Parejko, putting TAI into such columns will likely cause errors.

Second, the human interface as used in a query expression should use commonly-accepted literal formats. It should allow for future extensibility. We therefore agree with the proposal to allow ISO-format date-times with timezone indicator and MJD numbers prefixed with "MJD". The only question is whether these should be in the TAI timescale or the UTC timescale. The safest choice for ISO format may be to align with the definition of observing day, which is currently specified relative to UTC (minus 12 hours). Having MJD be in a different timescale is potentially worrisome but Colin Slater is convincing.

Show
Kian-Tat Lim added a comment - - edited After discussion with Wil, here is (my characterization of) our take on this. Apologies for the long delay. Basically, we agree with Andy Salnikov . First, the internal representation in a database is most useful when it is a single field. Multiple fields generally require either error-prone in-query assembly or use of a user-defined function. Having the field be a database-native DATETIME/TIMESTAMP is attractive as it provides access to native functions, but the difference between UTC and TAI may cause those functions (e.g. number of seconds in an interval) to be subtly incorrect. We think the safest choice for internal representation is 64-bit integer nanoseconds in the TAI timescale, as implemented in lsst.daf.base.DateTime . If dependencies on this package are undesirable, that C++ class and its Python wrapping could be extracted. It would also be possible to write a pure-Python implementation, perhaps using astropy.time for MJD and other approximate conversions but not for fundamental operations where it would introduce errors (because 1 ns is not exactly expressible as a fractional day). An additional, derived native DATETIME column for other uses is acceptable; it should follow the database-native conventions, virtually always meaning it should be UTC. John Parejko , putting TAI into such columns will likely cause errors. Second, the human interface as used in a query expression should use commonly-accepted literal formats. It should allow for future extensibility. We therefore agree with the proposal to allow ISO-format date-times with timezone indicator and MJD numbers prefixed with "MJD". The only question is whether these should be in the TAI timescale or the UTC timescale. The safest choice for ISO format may be to align with the definition of observing day, which is currently specified relative to UTC (minus 12 hours). Having MJD be in a different timescale is potentially worrisome but Colin Slater is convincing.
Hide
Russell Owen added a comment -

FYI: the EFD is using TAI. I thought we were using TAI everywhere, in order to avoid weird errors when computing time differences.

Show
Russell Owen added a comment - FYI: the EFD is using TAI. I thought we were using TAI everywhere, in order to avoid weird errors when computing time differences.
Hide
Andy Salnikov added a comment -

For human interface and ISO time specification I'd also prefer to have literal type that is distinguishable from a regular string type, this would simplify both parsing and transformation into database-level constructs. To be consistent with MJD I'd probably add a special prefix to string literal, 'T' may be an obvious choice. Here are few example how it can be specified in a data section expression:

 T'2020-02-14 02:37:37' T'20200214T023737' MJD58893.109456 

(other suggestions are welcome of course).

As we discussed it with K-T this syntax is intended for pipetask expression parser and should not be treated as common standard, e.g. if some application accepts MJD-only data it is reasonable and convenient to specify it as a number without prefix.

Show
Andy Salnikov added a comment - For human interface and ISO time specification I'd also prefer to have literal type that is distinguishable from a regular string type, this would simplify both parsing and transformation into database-level constructs. To be consistent with MJD I'd probably add a special prefix to string literal, 'T' may be an obvious choice. Here are few example how it can be specified in a data section expression: T'2020-02-14 02:37:37' T'20200214T023737' MJD58893.109456 (other suggestions are welcome of course). As we discussed it with K-T this syntax is intended for pipetask expression parser and should not be treated as common standard, e.g. if some application accepts MJD-only data it is reasonable and convenient to specify it as a number without prefix.
Hide
Tim Jenness added a comment -

I really don't want a daf_base dependency so could we use microsecond integers rather than nanosecond integers and use astropy.time? There is no scenario I can imagine that would be dealing with multiple datasets per nanosecond (and I'm sure that milliseconds would be fine).

Show
Tim Jenness added a comment - I really don't want a daf_base dependency so could we use microsecond integers rather than nanosecond integers and use astropy.time? There is no scenario I can imagine that would be dealing with multiple datasets per nanosecond (and I'm sure that milliseconds would be fine).
Hide
Tim Jenness added a comment -

Kian-Tat Lim I'm happy for this RFC to be adopted with an integer time if we use astropy and not daf_base. I'll leave it up to you to determine whether we really need nanoseconds in the registry rather than microseconds.

Show
Tim Jenness added a comment - Kian-Tat Lim  I'm happy for this RFC to be adopted with an integer time if we use astropy and not daf_base. I'll leave it up to you to determine whether we really need nanoseconds in the registry rather than microseconds.
Hide
Kian-Tat Lim added a comment - - edited

The following code is small and self-explanatory enough that it can be placed in daf_butler without needing a new package.

 import astropy.time   epoch = astropy.time.Time("1970-01-01T00:00:00", format="isot", scale="tai")   def from_isot_utc(isotutc : str) -> int:     moment = astropy.time.Time(isotutc, format="isot", scale="utc")     return int(round((moment - epoch).to_value("sec") * 1e9))   def to_isot_utc(nsecs : int) -> str:     moment = astropy.time.TimeDelta(nsecs * 1e-9, format="sec") + epoch    moment.precision = 6     return moment.utc.isot + "Z"   def from_mjd_tai(mjd : float) -> int:     moment = astropy.time.Time(mjd, format="mjd", scale="tai")     return int(round((moment - epoch).to_value("sec") * 1e9))   def to_mjd_tai(nsecs : int) -> float:     moment = astropy.time.TimeDelta(nsecs * 1e-9, format="sec") + epoch     moment.precision = 6     return moment.mjd 

The results from this code are consistent with daf_base DateTime in the domain of interest, and they are consistent with DateTime's MJD conversion accuracy of 1 microsecond or better.

I'm not sure why I was made the assignee for this RFC, but I will go ahead and adopt it, with the already-blocked ticket as the implementation.

Show
Kian-Tat Lim added a comment - - edited The following code is small and self-explanatory enough that it can be placed in daf_butler without needing a new package. import astropy.time   epoch = astropy.time.Time( "1970-01-01T00:00:00" , format = "isot" , scale = "tai" )   def from_isot_utc(isotutc : str ) - > int :     moment = astropy.time.Time(isotutc, format = "isot" , scale = "utc" )     return int ( round ((moment - epoch).to_value( "sec" ) * 1e9 ))   def to_isot_utc(nsecs : int ) - > str :     moment = astropy.time.TimeDelta(nsecs * 1e - 9 , format = "sec" ) + epoch   moment.precision = 6     return moment.utc.isot + "Z"   def from_mjd_tai(mjd : float ) - > int :     moment = astropy.time.Time(mjd, format = "mjd" , scale = "tai" )     return int ( round ((moment - epoch).to_value( "sec" ) * 1e9 ))   def to_mjd_tai(nsecs : int ) - > float :     moment = astropy.time.TimeDelta(nsecs * 1e - 9 , format = "sec" ) + epoch     moment.precision = 6     return moment.mjd The results from this code are consistent with daf_base DateTime in the domain of interest, and they are consistent with DateTime 's MJD conversion accuracy of 1 microsecond or better. I'm not sure why I was made the assignee for this RFC, but I will go ahead and adopt it, with the already-blocked ticket as the implementation.
Hide
Andy Salnikov added a comment -

Do we have a clear agreement for what units should we use for integer representation in database? I think Tim was saying that microseconds will work OK but K-T's examples use nanoseconds. Could we summarize the decision unambiguously here?

Show
Andy Salnikov added a comment - Do we have a clear agreement for what units should we use for integer representation in database? I think Tim was saying that microseconds will work OK but K-T's examples use nanoseconds. Could we summarize the decision unambiguously here?
Hide
Kian-Tat Lim added a comment -

Please use nanoseconds in the database, but it should be expected that values from the user, and many values in, e.g., headers, are only accurate to the microsecond.

Show
Kian-Tat Lim added a comment - Please use nanoseconds in the database, but it should be expected that values from the user, and many values in, e.g., headers, are only accurate to the microsecond.

## People

• Assignee:
Kian-Tat Lim
Reporter:
Kian-Tat Lim
Watchers:
Andy Salnikov, Christopher Stephens, Christopher Waters, Colin Slater, Eric Bellm, Fritz Mueller, Jim Bosch, John Parejko, Kian-Tat Lim, Russell Owen, Tim Jenness
• Votes:
0 Vote for this issue
Watchers:
11 Start watching this issue

## Dates

• Created:
Updated:
Resolved:
Planned End: