# Ingesting the secondary index into Qserv from local files

XMLWordPrintable

#### Details

• Type: Improvement
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
4
• Epic Link:
• Sprint:
DB_F20_09
• Team:
Data Access and Database

# Objectives

An implementation of the "secondary index" builder in the current version of the Replication/Ingest system requires that intermediate files (index contributions into the index table) were placed at a folder that would be directly seen by the MySQL server of Qserv czar. Once a file is placed into that folder the Master Controller of the Replication system would issue MySQL statement to load the file in the table:

 LOAD DATA INFILE ... 

As a direct consequence of that, both the MySQL server's container and the Replication Controller containers are required to share a folder where the contributions get stored. A problem is that in the Kubernetes environment, this is not always possible. It also makes the implementation of such distributed applications more complex and error-prone.

Hence, the goal of this effort is to extend the implementation of the system to allow an option for loading contributions using MySQL statements of the following kind:

 LOAD DATA LOCAL INFILE ... 

## Notes

According to the MySQL documentation of this table ingest technique https://dev.mysql.com/doc/refman/8.0/en/load-data.html one should be aware of the following caveats of the technique:

• error handlings gets more complex:
• incorrect data and duplicate keys may be ignored during the ingest
• any problems are reported by MySQL at the WARNING level rather than be reported as errors
• the overall performance of the index builder may be affected

The last complication should be taken with care due to a scale of the index (in terms of the number of rows to be ingested into the index table). In some past tests, it took up to 24 hours to load a table comprising 18 billion rows.

Additional considerations for using the proposed mechanism in the Qserv deplotements:

## Documentation

Update the following documents due to changes in the REST API and behavior of the new implementation:

#### Activity

Hide
Igor Gaponenko added a comment - - edited

# Large scale tests of the extended protocol

Using catalog sdss_stripe82_01 and director table RunDeepSource for the tests. The table is large enough to study effects of the new protocol.

Record the current number of objects in the "secondary index" of the catalog for the quality control of the tests:

 % mysql --protocol=tcp -hlocalhost -P3306 -uroot -p***** qservMeta \  -e "SELECT COUNT(*) FROM qservMeta.sdss_stripe82_01__RunDeepSource" 

 +-----------+ | COUNT(*) | +-----------+ | 186671762 | +-----------+ 

Rebuild the catalog using the original non-local version of the builder (by default, if option local is not specified, it's assumed to be 0):

 % time curl http://localhost:25080/ingest/index/secondary -X POST \  -H 'Content-Type: application/json' \  -d'{"auth_key":"kukara4a","database":"sdss_stripe82_01","allow_for_published":1,"rebuild":1}'   real 19m6.626s user 0m0.029s sys 0m0.039s 

Check the numbe of objects in the updated index:

 SELECT COUNT(*) FROM qservMeta.sdss_stripe82_01__RunDeepSource; +-----------+ | COUNT(*) | +-----------+ | 186671762 | +-----------+ 

NOw, rebuild the index using the local option:

 % time curl http://localhost:25080/ingest/index/secondary -X POST \  -H 'Content-Type: application/json' \  -d'{"auth_key":"kukara4a","database":"sdss_stripe82_01","allow_for_published":1,"rebuild":1,"local":1}'   real 19m9.016s user 0m0.028s sys 0m0.047s 

Check the number of entries in the index table to ensure it's the same as before:

 SELECT COUNT(*) FROM qservMeta.sdss_stripe82_01__RunDeepSource; +-----------+ | COUNT(*) | +-----------+ | 186671762 | +-----------+ 

## Conclusions:

• the new version of the builder produces similar results as the older one ( a simple method of counting rows is still sufficient to demonstrate the "similarity" due to UNIQUE constraint on the primary key of the table).
• performance degradation of the non-local build mode versus the local one is negligible, thereforo it's no longer a point of concern.
Show
Igor Gaponenko added a comment - - edited Large scale tests of the extended protocol Using catalog sdss_stripe82_01 and director table RunDeepSource for the tests. The table is large enough to study effects of the new protocol. Record the current number of objects in the "secondary index" of the catalog for the quality control of the tests: % mysql --protocol=tcp -hlocalhost -P3306 -uroot -p***** qservMeta \ -e "SELECT COUNT(*) FROM qservMeta.sdss_stripe82_01__RunDeepSource" + -----------+ | COUNT (*) | + -----------+ | 186671762 | + -----------+ Rebuild the catalog using the original non-local version of the builder (by default, if option local is not specified, it's assumed to be 0 ): % time curl http: //localhost :25080 /ingest/index/secondary -X POST \ -H 'Content-Type: application/json' \ -d '{"auth_key":"kukara4a","database":"sdss_stripe82_01","allow_for_published":1,"rebuild":1}'   real 19m6.626s user 0m0.029s sys 0m0.039s Check the numbe of objects in the updated index: SELECT COUNT (*) FROM qservMeta.sdss_stripe82_01__RunDeepSource; + -----------+ | COUNT (*) | + -----------+ | 186671762 | + -----------+ NOw, rebuild the index using the local option: % time curl http: //localhost :25080 /ingest/index/secondary -X POST \ -H 'Content-Type: application/json' \ -d '{"auth_key":"kukara4a","database":"sdss_stripe82_01","allow_for_published":1,"rebuild":1,"local":1}'   real 19m9.016s user 0m0.028s sys 0m0.047s Check the number of entries in the index table to ensure it's the same as before: SELECT COUNT (*) FROM qservMeta.sdss_stripe82_01__RunDeepSource; + -----------+ | COUNT (*) | + -----------+ | 186671762 | + -----------+ Conclusions: the new version of the builder produces similar results as the older one ( a simple method of counting rows is still sufficient to demonstrate the "similarity" due to UNIQUE constraint on the primary key of the table). performance degradation of the non-local build mode versus the local one is negligible, thereforo it's no longer a point of concern.
Hide
Nate Pease added a comment -

I had one very minor comment, but otherwise looks fine.

Show
Nate Pease added a comment - I had one very minor comment, but otherwise looks fine.

#### People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Reviewers:
Nate Pease
Watchers:
Fabrice Jammes, Fritz Mueller, Igor Gaponenko, Nate Pease
Votes:
0 Vote for this issue
Watchers:
4 Start watching this issue

#### Dates

Created:
Updated:
Resolved: