S3 credential options for LSST the Docs
This note outlines the basic options for securing client uploads to LSST the Docs. The basic options are:
- Use presigned paths that are generated on the server (LTD Keeper) and used by the clients.
- Using temporary credentials
- Have clients upload directly to the LTD Keeper server, and have LTD Keeper perform all S3 operations.
Presigned paths
In this regime, LTD Keeper is responsible for auth, and the upload client never receives S3 credentials.
Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
A presigned URL gives you access to the object identified in the URL, provided that the creator of the presigned URL has permissions to access that object. That is, if you receive a presigned URL to upload an object, you can upload the object only if the creator of the presigned URL has the necessary permissions to upload that object.
All objects and buckets by default are private. The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.
Boto3 supports generating presigned URLs. There is a generate_presigned_post method that lets you presign a URL for a POST-based upload:
S3 uploads can be done through either PUT or POST methods. The POST method is more advanced and supports up to 5TB multi-part uploads. See Securing AWS S3 uploads using presigned URLs – Aidan Hallett – Medium
Using a POST policy
Another method that's either similar, or exactly the same as the presigned URL is a POST policy. The interesting thing about this is it doesn't seem necessary to know the full key a priori. This is more efficient for a client uploading large numbers of objects.
This is discussed in this post:
Demystifying direct uploads from the browser to Amazon S3 - with a full example in 167 lines of code - Leonid Shevtsov
- The server creates a signed policy. It seems that this policy doesn't need to be limited to a single key. Instead, the policy can use starts-with instead of key. The details for the parameters associated with the POST are at Creating a POST Policy - Amazon Simple Storage Service.
- The policy can enforce other aspects of the upload beyond a key or key prefix; it can also control upload size, access policy, and headers.
- The presigned URL has a limited
While the earlier example showed how to create a presigned POST policy, it seems that the same workflow is now available from boto3 itself with the generate_presigned_post method.
To enable presigned uploads to a specific prefix, specify the Key parameter of generate_presigned_post using a $(filename) token. For example myprefix/$(filename) ensures that all uploads must be to the myprefix prefix.
Here is a snippet from the boto3 tests:
def test_generate_presigned_post_with_filename(self):
self.key = 'myprefix/${filename}'
self.client.generate_presigned_post(self.bucket, self.key)
_, post_kwargs = self.presign_post_mock.call_args
request_dict = post_kwargs['request_dict']
fields = post_kwargs['fields']
conditions = post_kwargs['conditions']
self.assertEqual(
request_dict['url'], 'https://s3.amazonaws.com/mybucket')
self.assertEqual(post_kwargs['expires_in'], 3600)
self.assertEqual(
conditions,
[{'bucket': 'mybucket'}, ['starts-with', '$key', 'myprefix/']])
self.assertEqual(
fields,
{'key': 'myprefix/${filename}'})
Question: where does the full key get set?
In requests post, the files key can take an explicit filename:
>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
"file": "<censored...binary...data>"
},
...
}
Does the filename here automatically become the $(filename) part of the key when the upload is done with the presigned URL? Or does the filename in the POST need to have the full key, including prefix, in order for the upload to work? This is something that can be determined experimentally.
Answer: yes, the filename is interpolated into the $(filename) part of the key string:
To use the file name provided by the user, use the ${filename} variable. For example, if the user Betty uploads the file lolcatz.jpg and you specify /user/Betty/${filename}, the key name is /user/Betty/lolcatz.jpg. See Object Key and Metadata - Amazon Simple Storage Service
Temporary credentials
The client could use temporary AWS credentials that are created by LTD Keeper and assigned to the client only when the client initiates a build upload.
The advantages of this are:
- The upload itself will look similar to how it's already being done.
- The client does not need to manage its own credentials.
- The credentials could be scoped to a specific build upload (limit what prefixes can be uploaded and what S3 operations can be done).
The best way to implement temporary credentials is through Temporary Security Credentials - AWS Identity and Access Management
You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources. Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use, with the following differences:
* Temporary security credentials are short-term, as the name implies. They can be configured to last for anywhere from a few minutes to several hours. After the credentials expire, AWS no longer recognizes them or allows any kind of access from API requests made with them.
* Temporary security credentials are not stored with the user but are generated dynamically and provided to the user when requested. When (or even before) the temporary security credentials expire, the user can request new credentials, as long as the user requesting them still has permissions to do so.
These differences lead to the following advantages for using temporary credentials:
* You do not have to distribute or embed long-term AWS security credentials with an application.
* You can provide access to your AWS resources to users without having to define an AWS identity for them. Temporary credentials are the basis for roles and identity federation .
* The temporary security credentials have a limited lifetime, so you do not have to rotate them or explicitly revoke them when they're no longer needed. After temporary security credentials expire, they cannot be reused. You can specify how long the credentials are valid, up to a maximum limit.
The temporary credentials are used similarly to long-term credentials:
You can use temporary security credentials to make programmatic requests for AWS resources with the AWS SDKs or API calls, the same way that you can use long-term security credentials such as IAM user credentials. However, there are a few differences:
* When you make a call using temporary security credentials, the call must include a session token, which is returned along with those temporary credentials. AWS uses the session token to validate the temporary security credentials.
* The temporary credentials expire after a specified interval. After the credentials expire, any calls that you make with those credentials will fail, so you must get a new set of credentials.
Uploading to LTD Keeper
Given that presigned POST URLs should work well, there's clearly no need to seriously consider uploading directly to the LTD Keeper application. Doing so would increase the load on LTD Keeper and probably slow down uploads.
Conclusion
Let's implement presigned POST paths. It's the cleanest solution and should further simplify the upload client by letting us use plain POST calls rather than using boto3. Since a single presigned POST policy (URL) can be generalized to work for a specific key prefix, it should be possible for LTD Keeper to created one presigned URL for a whole build. The client can use that URL for multiple uploads. The client doesn't need to inform LTD Keeper of all the keys that are being uploaded.
S3 credential options for LSST the Docs
This note outlines the basic options for securing client uploads to LSST the Docs. The basic options are:
Presigned paths
In this regime, LTD Keeper is responsible for auth, and the upload client never receives S3 credentials.
Uploading Objects Using Presigned URLs - Amazon Simple Storage Service
A presigned URL gives you access to the object identified in the URL, provided that the creator of the presigned URL has permissions to access that object. That is, if you receive a presigned URL to upload an object, you can upload the object only if the creator of the presigned URL has the necessary permissions to upload that object.
All objects and buckets by default are private. The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.
Boto3 supports generating presigned URLs. There is a generate_presigned_post method that lets you presign a URL for a POST-based upload:
S3 uploads can be done through either PUT or POST methods. The POST method is more advanced and supports up to 5TB multi-part uploads. See Securing AWS S3 uploads using presigned URLs – Aidan Hallett – Medium
Using a POST policy
Another method that's either similar, or exactly the same as the presigned URL is a POST policy. The interesting thing about this is it doesn't seem necessary to know the full key a priori. This is more efficient for a client uploading large numbers of objects.
This is discussed in this post:
Demystifying direct uploads from the browser to Amazon S3 - with a full example in 167 lines of code - Leonid Shevtsov
While the earlier example showed how to create a presigned POST policy, it seems that the same workflow is now available from boto3 itself with the generate_presigned_post method.
To enable presigned uploads to a specific prefix, specify the Key parameter of generate_presigned_post using a $(filename) token. For example myprefix/$(filename) ensures that all uploads must be to the myprefix prefix.
Here is a snippet from the boto3 tests:
def test_generate_presigned_post_with_filename(self):
self.key = 'myprefix/${filename}'
self.client.generate_presigned_post(self.bucket, self.key)
_, post_kwargs = self.presign_post_mock.call_args
request_dict = post_kwargs['request_dict']
fields = post_kwargs['fields']
conditions = post_kwargs['conditions']
self.assertEqual(
request_dict['url'], 'https://s3.amazonaws.com/mybucket')
self.assertEqual(post_kwargs['expires_in'], 3600)
self.assertEqual(
conditions,
[{'bucket': 'mybucket'}, ['starts-with', '$key', 'myprefix/']])
self.assertEqual(
fields,
{'key': 'myprefix/${filename}'})
Question: where does the full key get set?
In requests post, the files key can take an explicit filename:
>>> url = 'https://httpbin.org/post'
>>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}
>>> r = requests.post(url, files=files)
>>> r.text
{
...
"files": {
"file": "<censored...binary...data>"
},
...
}
Does the filename here automatically become the $(filename) part of the key when the upload is done with the presigned URL? Or does the filename in the POST need to have the full key, including prefix, in order for the upload to work? This is something that can be determined experimentally.
Answer: yes, the filename is interpolated into the $(filename) part of the key string:
To use the file name provided by the user, use the ${filename} variable. For example, if the user Betty uploads the file lolcatz.jpg and you specify /user/Betty/${filename}, the key name is /user/Betty/lolcatz.jpg. See Object Key and Metadata - Amazon Simple Storage Service
Temporary credentials
The client could use temporary AWS credentials that are created by LTD Keeper and assigned to the client only when the client initiates a build upload.
The advantages of this are:
The best way to implement temporary credentials is through Temporary Security Credentials - AWS Identity and Access Management
You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources. Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use, with the following differences:
* Temporary security credentials are short-term, as the name implies. They can be configured to last for anywhere from a few minutes to several hours. After the credentials expire, AWS no longer recognizes them or allows any kind of access from API requests made with them.
* Temporary security credentials are not stored with the user but are generated dynamically and provided to the user when requested. When (or even before) the temporary security credentials expire, the user can request new credentials, as long as the user requesting them still has permissions to do so.
These differences lead to the following advantages for using temporary credentials:
* You do not have to distribute or embed long-term AWS security credentials with an application.
* You can provide access to your AWS resources to users without having to define an AWS identity for them. Temporary credentials are the basis for roles and identity federation .
* The temporary security credentials have a limited lifetime, so you do not have to rotate them or explicitly revoke them when they're no longer needed. After temporary security credentials expire, they cannot be reused. You can specify how long the credentials are valid, up to a maximum limit.
The temporary credentials are used similarly to long-term credentials:
You can use temporary security credentials to make programmatic requests for AWS resources with the AWS SDKs or API calls, the same way that you can use long-term security credentials such as IAM user credentials. However, there are a few differences:
* When you make a call using temporary security credentials, the call must include a session token, which is returned along with those temporary credentials. AWS uses the session token to validate the temporary security credentials.
* The temporary credentials expire after a specified interval. After the credentials expire, any calls that you make with those credentials will fail, so you must get a new set of credentials.
Uploading to LTD Keeper
Given that presigned POST URLs should work well, there's clearly no need to seriously consider uploading directly to the LTD Keeper application. Doing so would increase the load on LTD Keeper and probably slow down uploads.
Conclusion
Let's implement presigned POST paths. It's the cleanest solution and should further simplify the upload client by letting us use plain POST calls rather than using boto3. Since a single presigned POST policy (URL) can be generalized to work for a specific key prefix, it should be possible for LTD Keeper to created one presigned URL for a whole build. The client can use that URL for multiple uploads. The client doesn't need to inform LTD Keeper of all the keys that are being uploaded.