# Managing SSL peer certification in the worker ingest services

XMLWordPrintable

#### Details

• Type: Improvement
• Status: Done
• Resolution: Done
• Fix Version/s: None
• Component/s:
• Labels:
None
• Story Points:
6
• Sprint:
DB_F20_09
• Team:
Data Access and Database

# 1. Goals

This development was triggered by DM-27091.

The ticket is about enhancing the implementation of the Ingest system to allow authenticating object-store servers when pulling table contributions from the servers via the https:// protocol. The current version of the Ingest server fails with the following error message reported by libcurl:

 SSL peer certificate or SSH remote key was not OK 

## 1.1 Proposed solution (as it was originally filed when creating the ticket)

Documentation for the library (see https://curl.se/docs/sslcerts.html) offers a number of solutions, of which two are going to be implemented in this effort:

1. Tell libcurl to not verify the peer. With libcurl you disable this with curl_easy_setopt(curl, CURLOPT_SSL_VERIFYPEER, FALSE);

With the curl command line tool, you disable this with k/-insecure.

2. Get a CA certificate that can verify the remote server and use the proper option to point out this CA cert for verification when connecting. For libcurl hackers: curl_easy_setopt(curl, CURLOPT_CAINFO, cacert);

With the curl command line tool: --cacert [file]

To support option 1, the new implementation of the worker's REST service /ingest/file (method POST) will accept the optional parameter:

 {...  "CURLOPT_SSL_VERIFYPEER":,  ... } 

Where a numeric value of the parameter would be treated as the boolean flag. The default behavior of the service would be not to set this option if the attribute was not found in a request.

The second option would be to allow preloading the value of the server's certificate via the new REST service:

method resource
PUT /ingest/setup

The JSON object passed with the request would have the following schema:

 {...  "SET_CURLOPT_CAINFO":{  "transaction_id":,  "value":,  "name":  },  ... } 

Where attribute value would have a certificate to be stored at a local file system of the worker in connection with the specified transaction. The optional attribute name can be used to disambiguate between multiple certificates if more than one server would be used during a transaction. Otherwise, the empty string would be assumed as the default value of the attribute name.
The information should be preloaded into worker just once per each transaction.

Certificates pre-loaded into workers would be supported in the new implementation of the worker's REST service /ingest/file (method POST) will get extended to accept the optional parameter:

 {...  "CURLOPT_CAINFO":,  ... } 

Where a value of the parameter would be the name of the above-set server certificate. The name would be normally empty if only one certificate was set.

If both CURLOPT_SSL_VERIFYPEER and CURLOPT_CAINFO were provided AND if the former was set to 0 then the latter will be ignored.

### 1.1.1 Other possibilities

QUESTION: Should we consider a database-wise option for setting certificates For example, a workflow would send a value of the certificate to the Master Replication Controller by calling a special (yet to be added) REST service. The same approach could be taken for the above-explained option 1 which tells workers to ignore certificate validation.

## 1.2 The refined solution (see comments posted below)

Keeping the original proposal to avoid invalidating comments and suggestions.

In the new proposal, the behavior of the Ingest system will be configured on a per-database basis with the following REST service to be added to the Master Replication Controller:

method resource
PUT /ingest/config

The JSON object passed with the request would have the following schema:

 {"auth_key":,  "database":,  "SSL_VERIFYPEER":  "CAPATH":,  "CAINFO":,  "CAINFO_VAL": } 

Where:

• A number passed as a value of attribute SSL_VERIFYPEER would be treated as a boolean flag. Zero value corresponds to false.
• A string passed as a value of attribute CAPATH would be a path ro a directory holding CA certificates available to workers.
• A string passed as a value of attribute CAINFO would be a path to Certificate Authority (CA) bundle available to workers.
• A string passed as a value of attribute CAINFO_VAL would be a value of a Certificate Authority (CA) bundle to be stored in the Ingest system's configuration and be deployed at a worker for each file ingested.

Other notes:

• Requests are required to provide a proper value of attribute auth_key (see documentation on the replication/Ingest system for further details on the security of the REST services).
• The name of a target database is also required in each such request.
• Attribute names (including the ones of SSL_VERIFYPEER, CAINFO, etc. are case-sensitive).
• A request to the service may have no parameters, have either of them or both of them. The empty request will be treated as "no request was made".
• Passing the empty strings as values of attributes CAPATH, CAINFO or CAINFO_VAL will eliminate the corresponding options in the configuration.
• It's up to a client to set the parameters before using them.
• Parameters can be set and reset as many times as needed.
• Parameter SSL_VERIFYPEER (if present and if set to zero) will take precedence over the others.
• Parameter CAINFO_VAL (if present and if it's not empty) will take precedence over CAINFO.

The won't be any changes to the existing API of the worker's REST service /ingest/file (method POST) in the new implementation of the system. This behavior may change in the future if needed.

In addition, the parameters retreival service will be implemented in the Master Replication Controller:

method resource
GET /ingest/config

The service requires the name of a database be provided in the JSON object passed with requests:

 {"database": } 

The service does not require any authorization.
The service will return the JSON object with the most recently set value of the parameters (if any were set):

 {"database":,  "SSL_VERIFYPEER":,  "CAPATH":,  "CAINFO":,  "CAINFO_VAL": } 

#### Activity

Hide
Andy Salnikov added a comment -

The code looks mostly OK, che3ck comments on PR. What worries me a little bit is that we allow clients to provide certificates via the same channel and same credentials as all other operations. This feels like a backdoor, and in that sense it is not very different than just ignoring verification completely.

Show
Andy Salnikov added a comment - The code looks mostly OK, che3ck comments on PR. What worries me a little bit is that we allow clients to provide certificates via the same channel and same credentials as all other operations. This feels like a backdoor, and in that sense it is not very different than just ignoring verification completely.
Hide
Igor Gaponenko added a comment -

Andy Salnikov Thank you for the review! I will consider all your suggestions made on the PR. Your concerns regarding having a potential "backdoor" are well-founded. Note that this implementation doesn't (and will not) support storing or carrying any sensitive information involving user credentials (like private SSH keys, passwords, etc.). The optional mechanism for managing peer certification has been added to address cases when there are site-specific problems with site-local data sources. An assumption here is that it should be up to a client ingesting data from a given source to decide if this is a trusted source. This is pretty much like specifying similar options when using curl or Python's requests module. In my opinion, this mechanism should be used in rare cases, primarily as a temporary solution. The documentation on the Ingest system will warn users on this subject.

The mechanism had to be added or the Kubernetes-based deployment where a client workflow has no control over the configuration of the worker pods or containers.

Show
Igor Gaponenko added a comment - Andy Salnikov Thank you for the review! I will consider all your suggestions made on the PR. Your concerns regarding having a potential "backdoor" are well-founded. Note that this implementation doesn't (and will not) support storing or carrying any sensitive information involving user credentials (like private SSH keys, passwords, etc.). The optional mechanism for managing peer certification has been added to address cases when there are site-specific problems with site-local data sources. An assumption here is that it should be up to a client ingesting data from a given source to decide if this is a trusted source. This is pretty much like specifying similar options when using curl or Python's requests module. In my opinion, this mechanism should be used in rare cases, primarily as a temporary solution. The documentation on the Ingest system will warn users on this subject. The mechanism had to be added or the Kubernetes-based deployment where a client workflow has no control over the configuration of the worker pods or containers.
Hide
Kian-Tat Lim added a comment -

Andy Salnikov Think of it as providing a location and an MD5 hash of data to be downloaded. In this case, it's checking the data server, not the data content, but it's similarly OK for both to come from the same place. The point is not to ensure that the data is coming from the right server, it's to ensure that the server being retrieved from is actually who it claims to be.

Show
Kian-Tat Lim added a comment - Andy Salnikov Think of it as providing a location and an MD5 hash of data to be downloaded. In this case, it's checking the data server, not the data content, but it's similarly OK for both to come from the same place. The point is not to ensure that the data is coming from the right server, it's to ensure that the server being retrieved from is actually who it claims to be.
Hide
Andy Salnikov added a comment -

Still does not make much sense to me. Verification only works (most of the time) if there is some authority which provides you with the correct identity/cert. If that authority is the same user then it's just an additional nuisance to the user.

Show
Andy Salnikov added a comment - Still does not make much sense to me. Verification only works (most of the time) if there is some authority which provides you with the correct identity/cert. If that authority is the same user then it's just an additional nuisance to the user.
Hide
Kian-Tat Lim added a comment -

Here's another analogy: I tell you to go pick up the data from my house, which is at some address that I give you. How will you be sure it's my house? I give you a key that fits in the door. If you went to the wrong house because someone changed the number or the street sign, the key wouldn't fit. Of course I could tell you to go to some other house and not even give you a key, but that's not what we're defending against.

Show
Kian-Tat Lim added a comment - Here's another analogy: I tell you to go pick up the data from my house, which is at some address that I give you. How will you be sure it's my house? I give you a key that fits in the door. If you went to the wrong house because someone changed the number or the street sign, the key wouldn't fit. Of course I could tell you to go to some other house and not even give you a key, but that's not what we're defending against.

#### People

Assignee:
Igor Gaponenko
Reporter:
Igor Gaponenko
Reviewers:
Andy Salnikov
Watchers:
Andy Salnikov, Fabrice Jammes, Fritz Mueller, Igor Gaponenko, Kian-Tat Lim, Nate Pease