Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-18418

Read the EFD Related Documents

    XMLWordPrintable

    Details

    • Type: Story
    • Status: Done
    • Resolution: Done
    • Fix Version/s: None
    • Component/s: ts_middleware
    • Labels:

      Description

      Review the EFD related documents and attend the related meetings.

        Attachments

          Activity

          Hide
          ktl Kian-Tat Lim added a comment -

          Something else to look at along these lines might be TimescaleDB.

          Show
          ktl Kian-Tat Lim added a comment - Something else to look at along these lines might be TimescaleDB.
          Show
          ttsai Te-Wei Tsai added a comment - - edited Github of TimescaleDB: https://github.com/timescale/timescaledb  Comparison between TimescaleDB and InfluxDB: https://blog.timescale.com/time-series-data-why-and-how-to-use-a-relational-database-instead-of-nosql-d0cd6975e87c/ https://blog.timescale.com/timescaledb-vs-influxdb-for-time-series-data-timescale-influx-sql-nosql-36489299877/    
          Hide
          ttsai Te-Wei Tsai added a comment -

          Based on the conversion with Andres, the EFD shall support the non-block reading in the tier 1. The MySQL Cluster can fulfill the requirement of reliability automatically. The influxDB needs to pay for the cluster version.

          Show
          ttsai Te-Wei Tsai added a comment - Based on the conversion with Andres, the EFD shall support the non-block reading in the tier 1. The MySQL Cluster can fulfill the requirement of reliability automatically. The influxDB needs to pay for the cluster version.
          Hide
          ttsai Te-Wei Tsai added a comment -

          The baseline of EFD is to have 4 servers, 32GB in RAM for each.

          Andres lists the restriction of MySQL Cluster in EFD implementation here:

          https://confluence.lsstcorp.org/pages/viewpage.action?spaceKey=LTS&title=MySQL+Cluster+limitations

          The main problem is the memory restriction of primary key (PK). Take the M1M3 as an example, the usage is 1.2GB of RAM/day. I listed the calculation by Andres below:

          ----------------------------------------------------------------------------------------

          In the link there is a section that says:

          PRIMARY KEY (PK)

          Every MySQL cluster table must have a primary key (PK). If you do NOT create now, MySQL Cluster creates one for you with a size of 8 bytes. Every PK causes a hash index (HI) which has a size of 20 bytes. HI are stored in index memory while all other information are stored in data memory.
          A PK also creates an ordered index (OI) unless you create it with USING HASH 

          Does this means that every value of the table has 20Bytes just for the index?, if so and assuming that the the PK is a datatype DATETIME (+8Bytes), each row has 28Bytes just for the index that would go directly to RAM.

          Whit this said + an example I have for ATDome_command_moveAzimuth, where I have 5.136.720 rows

          (Not considering data that are not indexes because they can go to Disk)

          Data In RAM = (20 + 8)*5136720/1024/1024 [MB]

          Data in RAM = 137MB (with commandLog is twice this value)

           

          Then if we consider the other tables + other TIERS it seems to me that is only possible to handle a limited amount of data (if we are careful maybe just for TIER1) before the RAM goes full.

           

          Using a concrete example (M1M3) there are 11 topics as telemetry that are published at 50Hz (there are also events at 50Hz but I’m only considering telemetry for the exercise), then:

          data/day for each table = 50*60*60*24 = 4.320.000

          Then for 11 topics, 11*4.320.000 = 47.520.000 or 1,27GB of RAM/day.

          Show
          ttsai Te-Wei Tsai added a comment - The baseline of EFD is to have 4 servers, 32GB in RAM for each. Andres lists the restriction of MySQL Cluster in EFD implementation here: https://confluence.lsstcorp.org/pages/viewpage.action?spaceKey=LTS&title=MySQL+Cluster+limitations The main problem is the memory restriction of primary key (PK). Take the M1M3 as an example, the usage is 1.2GB of RAM/day. I listed the calculation by Andres below: ---------------------------------------------------------------------------------------- In the link there is a section that says: PRIMARY KEY (PK) Every MySQL cluster table must have a primary key (PK). If you do NOT create now, MySQL Cluster creates one for you with a size of 8 bytes. Every PK causes a hash index (HI) which has a size of 20 bytes. HI are stored in index memory while all other information are stored in data memory. A PK also creates an ordered index (OI) unless you create it with USING HASH  Does this means that every value of the table has 20Bytes just for the index?, if so and assuming that the the PK is a datatype DATETIME (+8Bytes), each row has 28Bytes just for the index that would go directly to RAM. Whit this said + an example I have for ATDome_command_moveAzimuth , where I have 5.136.720 rows (Not considering data that are not indexes because they can go to Disk) Data In RAM = (20 + 8)*5136720/1024/1024 [MB] Data in RAM = 137MB (with commandLog is twice this value)   Then if we consider the other tables + other TIERS it seems to me that is only possible to handle a limited amount of data (if we are careful maybe just for TIER1) before the RAM goes full.   Using a concrete example (M1M3) there are 11 topics as telemetry that are published at 50Hz (there are also events at 50Hz but I’m only considering telemetry for the exercise), then: data/day for each table = 50*60*60*24 = 4.320.000 Then for 11 topics, 11*4.320.000 = 47.520.000 or 1,27GB of RAM/day.
          Hide
          aanania Andres Anania [X] (Inactive) added a comment -

          Te-Wei participated in the meeting with INRIA and has reviewed the documentation + made a lot good questions and ideas for improvement.

          Show
          aanania Andres Anania [X] (Inactive) added a comment - Te-Wei participated in the meeting with INRIA and has reviewed the documentation + made a lot good questions and ideas for improvement.

            People

            Assignee:
            ttsai Te-Wei Tsai
            Reporter:
            ttsai Te-Wei Tsai
            Reviewers:
            Andres Anania [X] (Inactive)
            Watchers:
            Andres Anania [X] (Inactive), Kian-Tat Lim, Simon Krughoff, Te-Wei Tsai, Tim Jenness
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:
              Start date:
              End date:

                Jenkins

                No builds found.