Status: In Progress
Fix Version/s: None
Team:Data Access and Database
The ticket was noticed while working on
The current implementation of the Qserv ingests system doesn't have support for ingesting data corresponding to the binary SQL types. This includes the MySQL types:
The problem occurs in two classes responsible for reading data files:
In all three cases, the code won't recognize an escape character before the EOL (end of line character) '\n' if the latter is found within the binary data. This results in interpreting the first met EOL character as the line terminator.
For example, consider the following schema:
In this case, the input row presented below will get interpreted by the current code as two separate rows (provided the column terminator character is a comma and the escape character is the back-slash):
- Extend the table loading interfaces (the uploader application qserv-replica-file and the worker ingest REST service /ingest/file) with an option allowing to specify the desired escape character.
- Use the escape character when reading and preprocessing input files in the above-mentioned classes.
- Pass the escape character to MySQL when ingesting the preprocessed files using LOAD DATA INFILE.
- To avoid code duplication introduce a utility class shared by implementations of the method IngestClient::send and IngestHttpSvcMod::_readLocal
- Have a lock at how this problem is addressed (if it's addressed) in the Git package https://github.com/lsst/partition