Details
-
Type:
Story
-
Status: Done
-
Resolution: Done
-
Fix Version/s: None
-
Component/s: Qserv
-
Labels:
-
Story Points:6
-
Epic Link:
-
Sprint:DB_S17_6
-
Team:Data Access and Database
Description
Objectives
Evaluating implications of various storage technologies (including hardware RAID of spinning disks, SSD and NVMe) on the performance of the secondary index loading (assuming the present InnoDB-based implemented).
Compile a detailed report for the study at:
Post technical details and a progress of the study in comments attached by this issue.
Resources
The study will be conduced on a special node ccqserv150.in2p3.fr of the Qserv cluster in IN2P3. This node is equipped with the following file systems:
- a redundant array (RAID6) of conventional spinning disks (7.3 TB)
- a single SSD disk (3.0 TB)
- a singe NVMe storage module (1.9 TB)
All three file systems are listed below:
% df -h |
Filesystem Size Used Avail Use% Mounted on
|
/dev/nvme0n1 1.9T 99G 1.8T 6% /qserv_nvm |
/dev/sdb 3.0T 96G 2.9T 4% /qserv_ssd |
/dev/sdc 7.3T 97G 7.2T 2% /qserv |
Other key parameters of the machine:
- Memory: 512 GB
- CPU model: Intel(R) Xeon(R) CPU E5-2680 v4
- CPU (base) frequency: 2.40GHz
- L3 cache size: 32 MB
- Number of processors/cores: dual CPU, 16 physical cores per CPU (32 in total), HT enabled (64 hardware threads in total)
Attachments
Issue Links
Activity
Sequential index loading : /qserv_ssd
Setting up
Set up a link to an input folder with TSV files as:
% cd /qserv_ssd/gapon |
% ls -al | grep tsv |
drwxr-xr-x 2 gapon babar 3448832 Jun 20 00:59 local_tsv
|
lrwxrwxrwx 1 gapon babar 26 Jun 25 03:26 tsv -> /qserv_nvm/gapon/local_tsv |
Configured database service with:
% cat /etc/my.cnf |
[mysqld]
|
|
datadir=/qserv_ssd/var/lib/mysql |
tmpdir=/qserv_ssd/mysql_tmpdir |
|
innodb_file_per_table=1
|
innodb_buffer_pool_size=128G
|
innodb_buffer_pool_instances=32
|
...
|
Restarted the service:
% sudo systemctl restart mysql.service |
Created the database and the secondary index table:
CREATE DATABASE secondary_index; |
CREATE TABLE IF NOT EXISTS secondary_index.sequential ( |
`source_id` CHAR(28) NOT NULL, |
`chunkId` int(11) DEFAULT NULL, |
`subChunkId` int(11) DEFAULT NULL, |
PRIMARY KEY (`source_id`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
Loading
% cd /qserv_ssd/gapon |
% mkdir log |
% nohup ./scripts/load_sequential.bash $PWD secondary_index sequential <user> <password> >& log/load_sequential.log& |
Monitoring
Beginning
The number pf TSV files have been loaded so far:
% ls -1 log/ | grep tsv | wc -l |
192
|
Disk I/O:
% iostat -m 1 | grep sdb |
sdb 157.00 0.00 37.25 0 37
|
sdb 171.00 0.00 32.22 0 32
|
sdb 192.00 0.00 40.27 0 40
|
sdb 529.00 0.00 91.14 0 91
|
sdb 515.00 0.00 86.12 0 86
|
sdb 366.00 0.00 71.36 0 71
|
sdb 284.00 0.00 48.30 0 48
|
sdb 403.00 0.00 78.94 0 78
|
CPU utilization:
% top |
%Cpu(s): 0.7 us, 0.1 sy, 0.0 ni, 99.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st |
170646 mysql 20 0 0.137t 0.011t 10968 S 79.5 2.2 1:59.32 mysqld
|
Table file size:
% sudo ls -alh /qserv_ssd/var/lib/mysql/secondary_index |
total 4.1G
|
drwx------ 2 mysql leftuser 61 Jun 25 03:30 .
|
drwxr-xr-x 13 mysql mysql 355 Jun 25 03:30 ..
|
-rw-rw---- 1 mysql leftuser 65 Jun 25 03:30 db.opt
|
-rw-rw---- 1 mysql leftuser 1014 Jun 25 03:30 sequential.frm
|
-rw-rw---- 1 mysql leftuser 2.3G Jun 25 03:36 sequential.ibd
|
After loading 48098 files, 6 hours later
Database table size on disk:
% sudo ls -alh ../var/lib/mysql/secondary_index/ |
-rw-rw---- 1 mysql leftuser 217G Jun 25 09:43 sequential.ibd
|
Disk I/O:
% iostat -m 1 | grep sdb |
sdb 424.00 0.00 55.56 0 55
|
sdb 619.00 0.00 81.55 0 81
|
sdb 531.00 0.00 54.77 0 54
|
sdb 402.00 0.00 60.68 0 60
|
sdb 122.00 0.00 21.84 0 21
|
CPU utilization:
% top |
%Cpu(s): 1.5 us, 0.2 sy, 0.0 ni, 98.2 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st |
KiB Mem : 52808342+total, 1625620 free, 30018172+used, 22627606+buff/cache |
170646 mysql 20 0 0.137t 0.135t 10928 S 87.1 27.4 328:29.37 mysqld
|
After loading 71277 files, 12 hours later
Database table size on disk:
% sudo ls -alh ../var/lib/mysql/secondary_index/ |
-rw-rw---- 1 mysql leftuser 447G Jun 25 17:33 sequential.ibd
|
Disk I/O:
% iostat -m 1 | grep sdb |
sdb 767.00 0.00 75.12 0 75
|
sdb 633.00 0.00 75.56 0 75
|
sdb 716.00 0.00 79.98 0 79
|
sdb 678.00 0.00 78.38 0 78
|
sdb 730.00 0.00 74.55 0 74
|
CPU utilization:
% top |
%Cpu(s): 1.3 us, 0.2 sy, 0.0 ni, 98.4 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st |
KiB Mem : 52808342+total, 1390064 free, 30013580+used, 22655753+buff/cache |
170646 mysql 20 0 0.137t 0.135t 10524 S 92.1 27.4 728:30.23 mysqld
|
Performance, observations
Total run time: 20:38.45
Table file size: 674 GB
Loading performance is presented below as the number of TSV files loaded within the 30 minute intervals and the I/O rate of reading the input TSV files averaged over the corresponding interval:
3835 2.6
|
4215 2.8
|
4267 2.8
|
4198 2.7
|
4280 2.8
|
4045 2.7
|
4056 2.8
|
3855 2.8
|
3772 2.9
|
3596 2.8
|
3396 2.8
|
3147 2.8
|
3079 2.8
|
2859 2.8
|
2479 2.8
|
2270 2.8
|
2117 2.7
|
1959 2.7
|
1303 2.4
|
587 1.7
|
1041 2.2
|
1899 2.7
|
1622 2.6
|
710 2.1
|
364 1.5
|
194 1.1
|
741 2.0
|
1216 2.5
|
1565 2.6
|
1614 2.5
|
1866 2.6
|
2021 2.6
|
2342 2.6
|
2560 2.7
|
2750 2.7
|
2962 2.7
|
3040 2.7
|
3150 2.7
|
3336 2.7
|
3409 2.7
|
3740 2.7
|
1435 1.0
|
Sequential index loading : /qserv_nvm
Setting up
Set up a link to an input folder with TSV files as:
% cd /qserv_nvm/gapon |
% ls -al | grep tsv |
drwxr-xr-x 2 gapon babar 3448832 Jun 25 03:26 local_tsv
|
lrwxrwxrwx 1 gapon babar 26 Jun 26 08:40 tsv -> /qserv_ssd/gapon/local_tsv |
Configured database service with:
% cat /etc/my.cnf |
[mysqld]
|
|
datadir=/qserv_nvm/var/lib/mysql |
tmpdir=/qserv_nvm/mysql_tmpdir |
|
innodb_file_per_table=1
|
innodb_buffer_pool_size=128G
|
innodb_buffer_pool_instances=32
|
...
|
Restarted the service:
% sudo systemctl restart mysql.service |
ATTENTION: this didn't quite work due to a strange configuration of the service. As a result of this 3 instances of the service were running simultaneously:
% ps -ef | grep mysql |
root 12688 1 0 Jun23 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/qserv/var/lib/mysql --pid-file=/qserv/var/lib/mysql/ccqserv150.pid |
mysql 12799 12688 31 Jun23 ? 18:29:36 /usr/sbin/mysqld --basedir=/usr --datadir=/qserv/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/log/mariadb/mariadb.log --pid-file=/qserv/var/lib/mysql/ccqserv150.pid --socket=/var/lib/mysql/mysql.sock |
root 137974 1 0 09:06 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/qserv_nvm/var/lib/mysql --pid-file=/qserv_nvm/var/lib/mysql/ccqserv150.pid |
mysql 138085 137974 7 09:06 ? 00:00:13 /usr/sbin/mysqld --basedir=/usr --datadir=/qserv_nvm/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysq --log-error=/var/log/mariadb/mariadb.log --pid-file=/qserv_nvm/var/lib/mysql/ccqserv150.pid --socket=/var/lib/mysql/mysql.sock |
gapon 138253 130030 0 09:09 pts/0 00:00:00 grep --color=auto mysql |
root 170535 1 0 Jun25 ? 00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/qserv_ssd/var/lib/mysql --pid-file=/qserv_ssd/var/lib/mysql/ccqserv150.pid |
mysql 170646 170535 60 Jun25 ? 18:04:14 /usr/sbin/mysqld --basedir=/usr --datadir=/qserv_ssd/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysq --log-error=/var/log/mariadb/mariadb.log --pid-file=/qserv_ssd/var/lib/mysql/ccqserv150.pid --socket=/var/lib/mysql/mysql.sock |
Hence, the other two had to be manually removed before restarting the desired service:
% sudo systemctl stop mysql.service |
% sudo kill 12799 12688 170646 170535 |
% sudo systemctl start mysql.service |
Created the database and the secondary index table:
CREATE DATABASE secondary_index; |
CREATE TABLE IF NOT EXISTS secondary_index.sequential ( |
`source_id` CHAR(28) NOT NULL, |
`chunkId` int(11) DEFAULT NULL, |
`subChunkId` int(11) DEFAULT NULL, |
PRIMARY KEY (`source_id`) |
) ENGINE=InnoDB DEFAULT CHARSET=latin1; |
Loading
% cd /qserv_nvm/gapon |
% mkdir log |
% nohup ./scripts/load_sequential.bash $PWD secondary_index sequential <user> <password> >& log/load_sequential.log& |
Monitoring
After loading 68273 files, 10 hours later
Database table size on disk:
% sudo ls -alh ../var/lib/mysql/secondary_index/ |
-rw-rw---- 1 mysql leftuser 394G Jun 26 19:17 sequential.ibd
|
Disk I/O:
% iostat -m 1 | grep nvm |
nvme0n1 1478.00 0.00 115.59 0 115
|
nvme0n1 575.00 0.00 52.26 0 52
|
nvme0n1 1988.00 0.00 133.81 0 133
|
nvme0n1 530.00 0.05 52.17 0 52
|
nvme0n1 376.00 0.00 41.00 0 41
|
nvme0n1 375.00 0.00 29.10 0 29
|
nvme0n1 916.00 0.00 74.96 0 74
|
nvme0n1 1360.00 0.00 101.04 0 101
|
nvme0n1 574.00 0.00 46.04 0 46
|
CPU utilization:
% top |
%Cpu(s): 1.3 us, 0.1 sy, 0.0 ni, 98.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st |
KiB Mem : 52808342+total, 1198704 free, 15674534+used, 37013936+buff/cache |
138480 mysql 20 0 0.137t 0.133t 10792 S 73.5 27.0 498:16.32 mysqld
|
Performance, observations
Total run time: 17:59.05
Table file size: 674 GB
Loading performance is presented below as the number of TSV files loaded within the 30 minute intervals and the I/O rate of reading the input TSV files averaged over the corresponding interval:
3819 3.0
|
4194 3.0
|
1519 1.0
|
3028 2.1
|
5247 3.5
|
5157 3.4
|
5120 3.3
|
5037 3.4
|
4889 3.3
|
4627 3.3
|
4301 3.3
|
4044 3.3
|
3819 3.2
|
3612 3.2
|
3289 3.2
|
2881 3.2
|
2561 3.1
|
2395 3.1
|
2001 3.0
|
1097 2.4
|
811 2.0
|
1961 3.0
|
1904 3.0
|
807 2.3
|
364 1.6
|
383 1.5
|
990 2.4
|
1599 2.9
|
1842 2.9
|
1997 2.9
|
2354 3.0
|
2625 3.0
|
2945 3.0
|
3206 3.0
|
3351 3.0
|
3508 3.0
|
3607 3.0
|
Summary and conclusions
In a context of this effort the low-level performance of 4 file systems have been evaluated for a variety of access patters. These are observations made during the tests:
- the only filesystem of the Qserv master node in PDAC has very poor performance practically in all test modes (NOTE: the random I/O performance hasn't been tested), which is at least 1 order of magnitude lower (actually - much lower) compared with any of three other tested file systems.
- the other 3 file system have demonstrated roughly equal performance for the sequential+direct write mode for all records sizes used in during the tests. The read performance of the traditional RAID6 array of spinning disks is noticeably lower in the sequential+direct mode. The discrepancy is explained by the large cache of the LSI RAID card configured in the write back mode (no wait before data are flushed to the disks while the cache is still not full) and writing test files which are smaller than the size of the cache.
- NOTE: this may need to be retested for the larger files.
Three filesystems (which were the primary focus of the investigation) have also been tested under a heavy load of multiple clients (256):
- the random I/O performance of the NVMe and SSD file systems is comparable.
- the performance of the traditional RAID6 array is 10 to 20 times lower.
- lower CPU utilization has been seen when testing NVMe
The performance of the tradition sequential protocol for loading the secondary index tables has been tested for all 3 file systems using yeh same input dataset (a collection of over 106,000 of TSV files). These numbers represent the loading time for each type of storage:
storage | load time |
---|---|
NVMe | 17:59.05 |
SSD | 20:38.45 |
RAID6 | 23:07.30 |
An attempt to optimize the resulting tables failed due to an extremely poor performance of the OPTIMIZE TABLE operation.
Conclusions:
- the NVMe and SSD technologies have a clear advantage over the traditional RAID6 for the random direct I/O under heavily parallel load
- the difference between NVMe and SSD in the low-level performance is insignificant (probably due to a fact that two SSD disks were put into the striped RAID0 configuration using the hardware RAID controller)
- all three file systems have demonstrated good performance during the sequential loading of the secondary index. Tough, NVMe has allowed a noticeable (though, not dramatic) reduction of the overall loading time over the others
- the overall loading time reduction of either of those file systems compared with the one of the current Qserv master in PDAC is roughly 10 times lower.
All planned tests have been complete.
TODO:
- Compile a detailed report for the study at: [https://confluence.lsstcorp.org/pages/viewpage.action?pageId=62259715
- Test the parallel index loading options as per
DM-11027
Sequential index loading : /qserv
Setting up
Set up a link to an input folder with TSV files as:
drwxr-xr-x 2 gapon babar 3448832 Jun 20 00:59 local_tsv
Prepared the loading script at:
#!/bin/bash
basedir=$1
db=$2
table=$3
user=$4
password=$5
done
Configured database service with:
[mysqld]
innodb_file_per_table=1
innodb_buffer_pool_size=128G
innodb_buffer_pool_instances=32
...
Restarted the service:
Created the database and the secondary index table:
Loading
Loading TSV files sequentially. Note the start time of the procedure will be extracted from the timestamps of the main log file:
Monitoring
Sampling I/O rates and CPU utilization at the beginning of loading and a few times through loading. The numbers may change as tables size will grow over time, and the InnodB engine will need to do more I/O to rebalance the B-Tree index of the table's PRIMARY key.
Beginning
The number pf TSV files have been loaded so far:
806
Disk I/O:
sdc 926.00 0.00 116.07 0 116
sdc 454.00 0.00 70.35 0 70
sdc 690.00 0.00 80.15 0 80
sdc 214.00 0.00 45.21 0 45
sdc 739.00 0.00 98.05 0 98
sdc 126.00 0.00 27.14 0 27
sdc 780.00 0.00 102.54 0 102
CPU utilization:
12799 mysql 20 0 0.137t 0.015t 10604 S 82.5 3.1 5:54.50 mysqld
After loading 14357 files
The number pf TSV files have been loaded so far:
14357
Disk I/O:
sdc 253.00 0.00 41.67 0 41
sdc 199.00 0.00 41.06 0 41
sdc 727.00 0.00 79.44 0 79
sdc 298.00 0.00 43.07 0 43
sdc 292.00 0.00 41.13 0 41
sdc 431.00 0.00 49.88 0 49
sdc 366.00 0.00 52.16 0 52
sdc 610.00 0.00 68.82 0 68
sdc 668.00 0.00 81.28 0 81
CPU utilization:
12799 mysql 20 0 0.137t 0.078t 10672 S 80.9 15.8 82:28.99 mysqld
After loading 45640 files, 6 hours later
Database table size on disk:
-rw-rw---- 1 mysql leftuser 203G Jun 24 07:21 sequential.ibd
Disk I/O:
sdc 575.00 0.00 61.88 0 61
sdc 518.00 0.00 43.89 0 43
sdc 224.00 0.00 37.04 0 37
sdc 222.00 0.00 44.32 0 44
sdc 635.00 0.23 78.63 0 78
sdc 413.00 0.00 57.37 0 57
sdc 1145.00 0.00 124.77 0 124
sdc 558.00 0.00 70.63 0 70
sdc 951.00 0.00 97.28 0 97
sdc 319.00 0.00 42.36 0 42
CPU utilization:
12799 mysql 20 0 0.137t 0.134t 10664 S 85.8 27.3 293:11.67 mysqld
After loading 74194 files, 16 hours later
Database table size on disk:
-rw-rw---- 1 mysql leftuser 478G Jun 24 18:17 sequential.ibd
Disk I/O:
sdc 652.00 0.00 78.00 0 78
sdc 954.00 0.00 107.38 0 107
sdc 544.00 0.00 79.30 0 79
sdc 46.00 0.00 9.24 0 9
sdc 576.00 0.00 75.52 0 75
sdc 47.00 0.00 11.11 0 11
sdc 996.00 0.00 111.47 0 111
sdc 376.00 0.00 53.12 0 53
sdc 632.00 0.00 83.87 0 83
sdc 307.00 0.00 47.93 0 47
CPU utilization:
12799 mysql 20 0 0.137t 0.132t 10400 S 89.4 26.8 782:56.78 mysqld
After loading 99482 files, 22 hours later
Database table size on disk:
-rw-rw---- 1 mysql leftuser 640G Jun 24 23:19 sequential.ibd
Disk I/O:
sdc 143.00 0.00 32.75 0 32
sdc 320.00 0.00 44.23 0 44
sdc 420.00 0.00 54.49 0 54
sdc 1064.00 0.00 108.51 0 108
sdc 155.00 0.00 36.76 0 36
sdc 179.00 0.00 30.28 0 30
sdc 715.00 0.00 89.21 0 89
sdc 56.00 0.00 12.90 0 12
sdc 574.00 0.00 73.27 0 73
CPU utilization:
12799 mysql 20 0 0.137t 0.131t 10400 S 82.2 26.7 1026:35 mysqld
Table optimization
Compact the table on disk to reclaim unused space as explained in:
Final size of the table before the optimization: 674 GB
Optimization tool:
#!/bin/bash
Run optimization:
Sun Jun 25 00:41:27 CEST 2017
<run>
Ho progress after 3 hours running as per:
-rw-rw---- 1 mysql leftuser 65 Jun 24 00:23 db.opt
-rw-rw---- 1 mysql leftuser 1014 Jun 24 00:23 sequential.frm
-rw-rw---- 1 mysql leftuser 674G Jun 25 00:28 sequential.ibd
The I/O rate was rather low to expect any chance of completion of this operation for the very large file:
...
sdc 250.00 6.81 4.00 6 4
sdc 217.00 6.08 4.00 6 4
sdc 235.00 6.02 4.00 6 4
sdc 251.00 6.62 5.00 6 5
sdc 205.00 5.12 4.00 5 4
sdc 213.00 5.70 6.00 5 6
sdc 234.00 3.94 22.00 3 22
sdc 157.00 2.80 10.00 2 10
sdc 118.00 2.56 5.00 2 5
sdc 179.00 4.89 3.00 4 3
sdc 175.00 5.14 2.00 5 2
sdc 154.00 3.73 10.00 3 10
sdc 218.00 6.88 2.00 6 2
sdc 211.00 5.80 4.00 5 4
sdc 188.00 5.11 7.00 5 7
sdc 211.00 1.84 36.00 1 36
sdc 235.00 6.02 3.00 6 3
sdc 223.00 5.75 3.00 5 3
sdc 260.00 7.88 3.00 7 3
sdc 230.00 6.06 2.00 6 2
sdc 243.00 5.81 4.00 5 4
sdc 229.00 5.61 2.00 5 2
sdc 251.00 7.19 3.00 7 3
sdc 241.00 7.11 3.00 7 3
sdc 268.00 8.02 3.00 8 3
sdc 228.00 6.94 3.00 6 3
sdc 213.00 6.84 3.00 6 3
sdc 266.00 7.20 3.03 7 3
sdc 243.00 7.27 3.00 7 3
sdc 214.00 6.31 2.00 6 2
sdc 196.00 6.14 3.00 6 3
sdc 190.00 6.78 3.00 6 3
sdc 214.00 6.98 3.00 6 3
...
CPU utilization was:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12799 mysql 20 0 0.137t 0.132t 10972 S 11.3 26.8 1107:27 mysqld
Therefore KILLED the operation after it took almost 3 hours w/o making any visible progress on the new (optimized) version of the table file using:
SHOW PROCESSLIST;
KILL 106902;
SHOW PROCESSLIST;
The input file hasn't changed:
-rw-rw---- 1 mysql leftuser 65 Jun 24 00:23 db.opt
-rw-rw---- 1 mysql leftuser 1014 Jun 24 00:23 sequential.frm
-rw-rw---- 1 mysql leftuser 674G Jun 25 00:28 sequential.ibd
Performance, observations
Total run time: 23:07.30
Table file size: 674 GB
Loading performance is presented below as the number of TSV files loaded within the 30 minute intervals and the I/O rate of reading the input TSV files averaged over the corresponding interval:
1797 2.4
1707 2.4
1249 2.2
440 1.3
493 1.3
1155 2.1
1705 2.4
1439 2.3
683 1.9
190 1.1
181 0.9
190 0.8
414 1.3
915 2.0
1337 2.4
1511 2.4
1560 2.4
1758 2.4
2026 2.5
2193 2.5
2393 2.5
2543 2.4
2668 2.4
2853 2.5
2961 2.5
3116 2.5
3173 2.5
3367 2.5
2960 2.0
1412 1.0
4143 2.8
4221 2.8
4177 2.7
4035 2.6
4024 2.7
3997 2.7
3861 2.7
3626 2.7
3506 2.7
3404 2.7
3152 2.6
2825 2.5
2792 2.5
2506 2.5
2309 2.4
2051 2.4
1873 2.4