Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-15089

Display format for numerical values for LSST data

    XMLWordPrintable

    Details

    • Story Points:
      4
    • Epic Link:
    • Sprint:
      SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02, SUIT Sprint 2019-03, SUIT Sprint 2019-04
    • Team:
      Science User Interface

      Description

      This ticket is to gather information for numerical data display for LSST data in SUIT. 

      Current Firefly displays 6 digits after decimal point by default. This may not be best for some data. We need to have a plan either specify the precision for each column in table data, or have a guideline for precision for different type of data, i.e ra, dec, magnitude, flux, error, ...

        Attachments

          Issue Links

            Activity

            No builds found.
            xiuqin Xiuqin Wu [X] (Inactive) created issue -
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Field Original Value New Value
            Risk Score 0
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Link This issue relates to DM-14743 [ DM-14743 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-08 [ 738 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09 [ 739 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Rank Ranked higher
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09 [ 739 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10 [ 739, 740 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Assignee Xiuqin Wu [ xiuqin ] Gregory Dubois-Felsmann [ gpdf ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Assignee Gregory Dubois-Felsmann [ gpdf ] Xiuqin Wu [ xiuqin ]
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            See https://confluence.lsstcorp.org/display/DM/The+Science+Data+Model+and+its+Standardization ; we need to think about whether that metadata model would be an appropriate place to include default output formats for LSST data or not.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - See https://confluence.lsstcorp.org/display/DM/The+Science+Data+Model+and+its+Standardization ; we need to think about whether that metadata model would be an appropriate place to include default output formats for LSST data or not.
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10 [ 739, 740 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11 [ 739, 740, 741 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Assignee Xiuqin Wu [ xiuqin ] Gregory Dubois-Felsmann [ gpdf ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Labels FireflyCCB-D
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            metadata should be used for sure. I want to know what to do in absence of metadata. I think the science team should give guidance here.

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - metadata should be used for sure. I want to know what to do in absence of metadata. I think the science team should give guidance here.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            What is the data type in the internal database for floating point numbers? Is it a mix of 32-bit and 64-bit floats, or only one of the two?

            Show
            gpdf Gregory Dubois-Felsmann added a comment - What is the data type in the internal database for floating point numbers? Is it a mix of 32-bit and 64-bit floats, or only one of the two?
            gpdf Gregory Dubois-Felsmann made changes -
            Watchers Gregory Dubois-Felsmann, Xiuqin Wu [ Gregory Dubois-Felsmann, Xiuqin Wu ] Gregory Dubois-Felsmann, Vandana Desai, Xiuqin Wu [ Gregory Dubois-Felsmann, Vandana Desai, Xiuqin Wu ]
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            I'm asking because for 32-bit floats, we should just display the full precision all the time unless there's some explicit override. For 64-bit floats this approach doesn't produce a good user experience and would need some limitation.

            Note that if we implement the "property sheets" for table rows, my inclination is that those should always display the full precision available.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - I'm asking because for 32-bit floats, we should just display the full precision all the time unless there's some explicit override. For 64-bit floats this approach doesn't produce a good user experience and would need some limitation. Note that if we implement the "property sheets" for table rows, my inclination is that those should always display the full precision available.
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            Loi Ly I should have pinged you on my November 8th question right away. Can you comment on this one? Do we use both 32-bit and 64-bit floats in the in-memory database?

            To get more concrete about the recommendation, so that the FireflyCCB-D process can proceed:

            32-bit floating point numbers have ~7.2 digits of precision (remember, for instance, the largest integer for which adding 1 still works for 32-bit floats: 16,777,216); in order not to be information-destroying, then, one should by default display them with 8 digits of precision. (And, in writing them to textual formats for downloading, this must be done even if the online display is in some way decided to limit significance.)

            In the absence of other formatting information, I would recommend displaying them in something like the Fortran "G" format, so that numbers "near one" are displayed without the exponential notation. (I'm trying to avoid having familiar decimal values of ra, dec, galactic coordinates, and stellar magnitudes displayed with exponents.) I suspect Joe M. may disagree with me on this, though, based on a recent conversation.

            Show
            gpdf Gregory Dubois-Felsmann added a comment - Loi Ly I should have pinged you on my November 8th question right away. Can you comment on this one? Do we use both 32-bit and 64-bit floats in the in-memory database? To get more concrete about the recommendation, so that the FireflyCCB-D process can proceed: 32-bit floating point numbers have ~7.2 digits of precision (remember, for instance, the largest integer for which adding 1 still works for 32-bit floats: 16,777,216); in order not to be information-destroying, then, one should by default display them with 8 digits of precision. (And, in writing them to textual formats for downloading, this must be done even if the online display is in some way decided to limit significance.) In the absence of other formatting information, I would recommend displaying them in something like the Fortran "G" format, so that numbers "near one" are displayed without the exponential notation. (I'm trying to avoid having familiar decimal values of ra, dec, galactic coordinates, and stellar magnitudes displayed with exponents.) I suspect Joe M. may disagree with me on this, though, based on a recent conversation.
            Hide
            loi Loi Ly added a comment -

            HSQLDB stored both float and double as 64 bits double.  However, we do distinguish them and they are mapped back to float when retrieved.  So, not applying any format to float should still yield the 8 digits of precision you described above.

            Show
            loi Loi Ly added a comment - HSQLDB stored both float and double as 64 bits double.  However, we do distinguish them and they are mapped back to float when retrieved.  So, not applying any format to float should still yield the 8 digits of precision you described above.
            Hide
            loi Loi Ly added a comment - - edited

            Gregory Dubois-Felsmann I would like to throw in our idea for your consideration.  This came as part of a PR reviewed by Tatiana Goldina.

            We suggest not applying any format when one is not given.  It will use the standard Java's toString().  Java's toString() is similar to `G` format except decimal range is between 10^-3^ and  10^7^ .  After that, it will use scientific notation up to the number of significant digits needed.
            This ensure there's no loss of data and no trailing 0s.  Here's the doc with more details.  https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)

            Show
            loi Loi Ly added a comment - - edited Gregory Dubois-Felsmann I would like to throw in our idea for your consideration.  This came as part of a PR reviewed by Tatiana Goldina . We suggest not applying any format when one is not given.  It will use the standard Java's toString().  Java's toString() is similar to `G` format except decimal range is between 10^-3^ and  10^7^ .  After that, it will use scientific notation up to the number of significant digits needed. This ensure there's no loss of data and no trailing 0s.  Here's the doc with more details.  https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11 [ 739, 740, 741 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12 [ 739, 740, 741, 796 ]
            Hide
            tatianag Tatiana Goldina added a comment - - edited

            Currently Firefly uses %9g for FITS tables, guessing with minimum of 8 decimal digits - for VO Tables.  If the first value is in scientific format with 1 digit precision, all values will be formatted like this. 
             
            To Illustrate the problem, consider the following result table:

            <TABLE name="results">     
              <FIELD name="Float Field"  datatype="float"/>     
              <FIELD name="Double Field"  datatype="double"/>     
              <DATA>       
                <TABLEDATA>          
                  <TR><TD>10.68</TD><TD>10.68</TD></TR>
                  <TR><TD>287.3</TD><TD>287.3</TD></TR>
                  <TR><TD>1.2e-9</TD><TD>1.2e-9</TD></TR>
                </TABLEDATA>
              </DATA> 
            </TABLE> 

             Displayed in Firefly, the table will look like this:

            Float Column Double Column
            10.68000031 10.68000000
            287.29998779 287.30000000
            0.00000000 0.00000000

             I see at least 3 issues here:

            1. Float precision is exceeded
            2. Trailing zeroes are making wrong statement about the source data precision
            3. Guessing takes into account only the first non-null value. If the first value has 8 digit precision, all values will be formatted with this precision.
            Show
            tatianag Tatiana Goldina added a comment - - edited Currently Firefly uses %9g for FITS tables, guessing with minimum of 8 decimal digits - for VO Tables.  If the first value is in scientific format with 1 digit precision, all values will be formatted like this.    To Illustrate the problem, consider the following result table: <TABLE name="results">     <FIELD name="Float Field"  datatype="float"/>     <FIELD name="Double Field"  datatype="double"/>     <DATA>       <TABLEDATA>           <TR><TD>10.68</TD><TD>10.68</TD></TR> <TR><TD>287.3</TD><TD>287.3</TD></TR> <TR><TD>1.2e-9</TD><TD>1.2e-9</TD></TR> </TABLEDATA> </DATA> </TABLE>   Displayed in Firefly, the table will look like this: Float Column Double Column 10.68000031 10.68000000 287.29998779 287.30000000 0.00000000 0.00000000  I see at least 3 issues here: Float precision is exceeded Trailing zeroes are making wrong statement about the source data precision Guessing takes into account only the first non-null value. If the first value has 8 digit precision, all values will be formatted with this precision.
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Remote Link This issue links to "Display precision ticket in IRSA Jira (Web Link)" [ 19252 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Epic Link DM-13641 [ 38991 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Status To Do [ 10001 ] In Progress [ 3 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Link This issue relates to DM-15274 [ DM-15274 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12 [ 739, 740, 741, 796 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01 [ 739, 740, 741, 796, 814 ]
            Hide
            gpdf Gregory Dubois-Felsmann added a comment -

            We discussed this ticket a bit in the FireflyCCB meeting today.  We are going to talk about this in the FireflyCCB-D advisory group now, possibly under a new ticket in IPAC Jira (which we'll post here).

            Show
            gpdf Gregory Dubois-Felsmann added a comment - We discussed this ticket a bit in the FireflyCCB meeting today.  We are going to talk about this in the FireflyCCB-D advisory group now, possibly under a new ticket in IPAC Jira (which we'll post here).
            gpdf Gregory Dubois-Felsmann made changes -
            Labels FireflyCCB-D FireflyCCB FireflyCCB-D
            Hide
            tatianag Tatiana Goldina added a comment -

            When deciding how to be user friendly, we should understand, that whatever algorithm we choose to limit the precision of the displayed numbers, there will be a scenario (a set of numbers), when it fails. Hence if we are limiting the precision, we should allow to change it. And we should allow users to see the original data from the data provider. Hence, preserving the original precision in the absence of other information is the first step toward being user friendly.

            We should avoid displaying wrong data or misrepresenting precision.

            Loi has suggested using Java’s toString(). There is no loss of data and no trailing zeros. The disadvantage is that the formatting of a column might be inconsistent: since the trailing zeros are omitted, the numbers in a column might appear with the different number of decimal places. Also, choosing scientific or decimal format depends on the data value, see https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double)

            Using Java’s toString() works very well when the input data came as strings. With binary data (like binary FITS table), we’d display the original number. For example, if 0.1 is represented by 0.0999999999999659, we’d display 0.0999999999999659. This is not user friendly, but as long as we are not displaying wrong data — and we are displaying exactly what we have received — the rounding issues can be addressed later by allowing users to adjust the format.

            Show
            tatianag Tatiana Goldina added a comment - When deciding how to be user friendly, we should understand, that whatever algorithm we choose to limit the precision of the displayed numbers, there will be a scenario (a set of numbers), when it fails. Hence if we are limiting the precision, we should allow to change it. And we should allow users to see the original data from the data provider. Hence, preserving the original precision in the absence of other information is the first step toward being user friendly. We should avoid displaying wrong data or misrepresenting precision. Loi has suggested using Java’s toString(). There is no loss of data and no trailing zeros. The disadvantage is that the formatting of a column might be inconsistent: since the trailing zeros are omitted, the numbers in a column might appear with the different number of decimal places. Also, choosing scientific or decimal format depends on the data value, see https://docs.oracle.com/javase/7/docs/api/java/lang/Double.html#toString(double ) Using Java’s toString() works very well when the input data came as strings. With binary data (like binary FITS table), we’d display the original number. For example, if 0.1 is represented by 0.0999999999999659, we’d display 0.0999999999999659. This is not user friendly, but as long as we are not displaying wrong data — and we are displaying exactly what we have received — the rounding issues can be addressed later by allowing users to adjust the format.
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01 [ 739, 740, 741, 796, 814 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02 [ 739, 740, 741, 796, 814, 815 ]
            frossie Frossie Economou made changes -
            Status Admin Review [ 3 ] In Progress [ 11605 ]
            frossie Frossie Economou made changes -
            Status Review [ 11605 ] In Progress [ 3 ]
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Story Points 4
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02 [ 739, 740, 741, 796, 814, 815 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02, SUIT Sprint 2019-3 [ 739, 740, 741, 796, 814, 815, 860 ]
            tatianag Tatiana Goldina made changes -
            Link This issue relates to DM-18489 [ DM-18489 ]
            Hide
            ejoliet Emmanuel Joliet added a comment -

            Gregory Dubois-Felsmann could you post here the CCB ticket. I can't find it. Thanks.

            Show
            ejoliet Emmanuel Joliet added a comment - Gregory Dubois-Felsmann could you post here the CCB ticket. I can't find it. Thanks.
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Sprint SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02, SUIT Sprint 2019-03 [ 739, 740, 741, 796, 814, 815, 860 ] SUIT Sprint 2018-09, SUIT Sprint 2018-10, SUIT Sprint 2018-11, SUIT Sprint 2018-12, SUIT Sprint 2019-01, SUIT Sprint 2019-02, SUIT Sprint 2019-03, SUIT Sprint 2019-04 [ 739, 740, 741, 796, 814, 815, 860, 861 ]
            Hide
            xiuqin Xiuqin Wu [X] (Inactive) added a comment -

            DM-15826 recognizes the VOTable precision for values and Firefly will respect that attribute. Specifically the following was in that ticket:

            Note: values of attributes precision and width are concatenated like 'wwEn' or 'wwFn'  and stored as 'precision' in DataType, where 'ww' stands for the value from attribute width and  'En' or 'Fn' stands for value from  attribute precision. If the original precision value contains numeric digit only like '2', it is converted to be 'F2'. 

            Show
            xiuqin Xiuqin Wu [X] (Inactive) added a comment - DM-15826 recognizes the VOTable precision for values and Firefly will respect that attribute. Specifically the following was in that ticket: Note: values of attributes precision and width are concatenated like ' wwEn'  or ' wwFn'   and stored as ' precision ' in  DataType,  where 'ww' stands for the value from attribute width and  'En' or 'Fn' stands for value from  attribute precision. If the original precision value contains numeric digit only like '2', it is converted to be 'F2'. 
            xiuqin Xiuqin Wu [X] (Inactive) made changes -
            Resolution Done [ 10000 ]
            Status In Progress [ 3 ] Done [ 10002 ]

              People

              Assignee:
              gpdf Gregory Dubois-Felsmann
              Reporter:
              xiuqin Xiuqin Wu [X] (Inactive)
              Watchers:
              Emmanuel Joliet, Gregory Dubois-Felsmann, Loi Ly, Tatiana Goldina, Vandana Desai, Xiuqin Wu [X] (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Jenkins

                  No builds found.