Uploaded image for project: 'Data Management'
  1. Data Management
  2. DM-20122

Investigate client auth patterns for LSST the Docs

    Details

      Description

      This is an investigation, design, and planning ticket for the larger DM-18720 epic.

      The basic problem is that clients for LSST the Docs generally shouldn't have S3 credentials. They are both extra things for clients to track, and they can potentially give clients more access to S3 than they need. We want to move to a model with LTD Keeper is the only application with credentials to LTD's S3 bucket. The question then, is how LTD Keeper should provide access for clients to upload new builds into the S3 bucket. The main options are:

      1. LTD Keeper creates a temporary S3 credential for the client, and then ensures that the credential is deleted/rotated.
      2. LTD Keeper generates presigned URLs that the client can upload to.
      3. The client uploads files to LTD Keeper, and LTD Keeper forwards those objects to S3.

        Attachments

          Issue Links

            Activity

            Hide
            jsick Jonathan Sick added a comment -

            S3 credential options for LSST the Docs

             

            This note outlines  the basic options for securing client uploads to LSST the Docs. The basic options are:

             

            1. Use presigned paths that are generated on the server (LTD Keeper) and used by the clients.
            2. Using temporary credentials
            3. Have clients upload directly to the LTD Keeper server, and have LTD Keeper perform all S3 operations.

             

            Presigned paths

             

            In this regime, LTD Keeper is responsible for auth, and the upload client never receives S3 credentials.

             

            Uploading Objects Using Presigned URLs - Amazon Simple Storage Service

             

            A presigned URL gives you access to the object identified in the URL, provided that the creator of the presigned URL has permissions to access that object. That is, if you receive a presigned URL to upload an object, you can upload the object only if the creator of the presigned URL has the necessary permissions to upload that object.

             

            All objects and buckets by default are private. The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.

             

            Boto3 supports generating presigned URLs.  There is a generate_presigned_post method that lets you presign a URL for a POST-based upload:

             

             

            S3 uploads can be done through either PUT or POST methods. The POST method is more advanced and supports up to 5TB multi-part uploads. See Securing AWS S3 uploads using presigned URLs – Aidan Hallett – Medium

             

            Using a POST policy

             

             

            Another method that's either similar, or exactly the same as the presigned URL is a POST policy. The interesting thing about this is it doesn't seem necessary to know the full key a priori. This is more efficient for a client uploading large numbers of objects.

             

            This is discussed in this post:

             

            Demystifying direct uploads from the browser to Amazon S3 - with a full example in 167 lines of code - Leonid Shevtsov

             

            • The server creates a signed policy. It seems that this policy doesn't need to be limited to a single key. Instead, the policy can use starts-with instead of key. The details for the parameters associated with the POST are at Creating a POST Policy - Amazon Simple Storage Service.
            • The policy can enforce other aspects of the upload beyond a key or key prefix; it can also control upload size, access policy, and headers.
            • The presigned URL has a limited 

             

            While the earlier example showed how to create a presigned POST policy, it seems that the same workflow is now available from boto3 itself with the generate_presigned_post method.

             

            To enable presigned uploads to a specific prefix, specify the Key parameter of generate_presigned_post using a $(filename) token. For example myprefix/$(filename) ensures that all uploads must be to the myprefix prefix.

             

            Here is a snippet from the boto3 tests:

             

                def test_generate_presigned_post_with_filename(self):

                    self.key = 'myprefix/${filename}'

                    self.client.generate_presigned_post(self.bucket, self.key)

             

                    _, post_kwargs = self.presign_post_mock.call_args

                    request_dict = post_kwargs['request_dict']

                    fields = post_kwargs['fields']

                    conditions = post_kwargs['conditions']

                    self.assertEqual(

                        request_dict['url'], 'https://s3.amazonaws.com/mybucket')

                    self.assertEqual(post_kwargs['expires_in'], 3600)

                    self.assertEqual(

                        conditions,

                        [{'bucket': 'mybucket'}, ['starts-with', '$key', 'myprefix/']])

                    self.assertEqual(

                        fields,

                        {'key': 'myprefix/${filename}'})

             

            Question: where does the full key get set? 

             

            In requests post, the files key can take an explicit filename:

             

            >>> url = 'https://httpbin.org/post'

            >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}

             

            >>> r = requests.post(url, files=files)

            >>> r.text

            {

              ...

              "files": {

                "file": "<censored...binary...data>"

              },

              ...

            }

             

            Does the filename here automatically become the $(filename) part of the key when the upload is done with the presigned URL? Or does the filename in the POST need to have the full key, including prefix, in order for the upload to work? This is something that can be determined experimentally.

             

            Answer: yes, the filename is interpolated into the $(filename) part of the key string:

             

            To use the file name provided by the user, use the ${filename} variable. For example, if the user Betty uploads the file lolcatz.jpg and you specify /user/Betty/${filename}, the key name is /user/Betty/lolcatz.jpg. See Object Key and Metadata - Amazon Simple Storage Service

             

            Temporary credentials

             

            The client could use temporary AWS credentials that are created by LTD Keeper and assigned to the client only when the client initiates a build upload.

             

            The advantages of this are:

             

            • The upload itself will look similar to how it's already being done.
            • The client does not need to manage its own credentials.
            • The credentials could be scoped to a specific build upload (limit what prefixes can be uploaded and what S3 operations can be done).

             

            The best way to implement temporary credentials is through Temporary Security Credentials - AWS Identity and Access Management

             

            You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources. Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use, with the following differences:

            * Temporary security credentials are short-term, as the name implies. They can be configured to last for anywhere from a few minutes to several hours. After the credentials expire, AWS no longer recognizes them or allows any kind of access from API requests made with them. 

            * Temporary security credentials are not stored with the user but are generated dynamically and provided to the user when requested. When (or even before) the temporary security credentials expire, the user can request new credentials, as long as the user requesting them still has permissions to do so. 

             

            These differences lead to the following advantages for using temporary credentials:

            * You do not have to distribute or embed long-term AWS security credentials with an application. 

            * You can provide access to your AWS resources to users without having to define an AWS identity for them. Temporary credentials are the basis for  roles and identity federation . 

            * The temporary security credentials have a limited lifetime, so you do not have to rotate them or explicitly revoke them when they're no longer needed. After temporary security credentials expire, they cannot be reused. You can specify how long the credentials are valid, up to a maximum limit.

             

            The temporary credentials are used similarly to long-term credentials:

             

            You can use temporary security credentials to make programmatic requests for AWS resources with the  AWS SDKs  or API calls, the same way that you can use long-term security credentials such as IAM user credentials. However, there are a few differences:

            * When you make a call using temporary security credentials, the call must include a session token, which is returned along with those temporary credentials. AWS uses the session token to validate the temporary security credentials. 

            * The temporary credentials expire after a specified interval. After the credentials expire, any calls that you make with those credentials will fail, so you must get a new set of credentials.

             

            Uploading to LTD Keeper

             

            Given that presigned POST URLs should work well, there's clearly no need to seriously consider uploading directly to the LTD Keeper application. Doing so would increase the load on LTD Keeper and probably slow down uploads.

             

            Conclusion

             

            Let's implement presigned POST paths. It's the cleanest solution and should further simplify the upload client by letting us use plain POST calls rather than using boto3. Since a single presigned POST policy (URL) can be generalized to work for a specific key prefix, it should be possible for LTD Keeper to created one presigned URL for a whole build. The client can use that URL for multiple uploads. The client doesn't need to inform LTD Keeper of all the keys that are being uploaded.

             

            Show
            jsick Jonathan Sick added a comment - S3 credential options for LSST the Docs   This note outlines  the basic options for securing client uploads to LSST the Docs. The basic options are:   Use presigned paths that are generated on the server (LTD Keeper) and used by the clients. Using temporary credentials Have clients upload directly to the LTD Keeper server, and have LTD Keeper perform all S3 operations.   Presigned paths   In this regime, LTD Keeper is responsible for auth, and the upload client never receives S3 credentials.   Uploading Objects Using Presigned URLs - Amazon Simple Storage Service   A presigned URL gives you access to the object identified in the URL, provided that the creator of the presigned URL has permissions to access that object. That is, if you receive a presigned URL to upload an object, you can upload the object only if the creator of the presigned URL has the necessary permissions to upload that object.   All objects and buckets by default are private. The presigned URLs are useful if you want your user/customer to be able to upload a specific object to your bucket, but you don't require them to have AWS security credentials or permissions. When you create a presigned URL, you must provide your security credentials and then specify a bucket name, an object key, an HTTP method (PUT for uploading objects), and an expiration date and time. The presigned URLs are valid only for the specified duration.   Boto3 supports generating presigned URLs.  There is a generate_presigned_post method that lets you presign a URL for a POST-based upload:   Presigned URLs — Boto 3 Docs 1.9.166 documentation Example of uploading a file with a presigned URL   S3 uploads can be done through either PUT or POST methods. The POST method is more advanced and supports up to 5TB multi-part uploads. See Securing AWS S3 uploads using presigned URLs – Aidan Hallett – Medium   Using a POST policy     Another method that's either similar, or exactly the same as the presigned URL is a POST policy . The interesting thing about this is it doesn't seem necessary to know the full key a priori. This is more efficient for a client uploading large numbers of objects.   This is discussed in this post:   Demystifying direct uploads from the browser to Amazon S3 - with a full example in 167 lines of code - Leonid Shevtsov   The server creates a signed policy. It seems that this policy doesn't need to be limited to a single key. Instead, the policy can use starts-with instead of key. The details for the parameters associated with the POST are at Creating a POST Policy - Amazon Simple Storage Service . The policy can enforce other aspects of the upload beyond a key or key prefix; it can also control upload size, access policy, and headers . The presigned URL has a limited    While the earlier example showed how to create a presigned POST policy, it seems that the same workflow is now available from boto3 itself with the generate_presigned_post method.   To enable presigned uploads to a specific prefix, specify the Key parameter of generate_presigned_post using a $(filename) token. For example myprefix/$(filename) ensures that all uploads must be to the myprefix prefix.   Here is a snippet from the boto3 tests :       def test_generate_presigned_post_with_filename(self):         self.key = 'myprefix/${filename}'         self.client.generate_presigned_post(self.bucket, self.key)           _, post_kwargs = self.presign_post_mock.call_args         request_dict = post_kwargs ['request_dict']         fields = post_kwargs ['fields']         conditions = post_kwargs ['conditions']         self.assertEqual(             request_dict ['url'] , 'https://s3.amazonaws.com/mybucket')         self.assertEqual(post_kwargs ['expires_in'] , 3600)         self.assertEqual(             conditions,             [{'bucket': 'mybucket'}, ['starts-with', '$key', 'myprefix/'] ])         self.assertEqual(             fields,             {'key': 'myprefix/${filename}'})   Question: where does the full key get set?    In requests post, the files key can take an explicit filename:   >>> url = 'https://httpbin.org/post' >>> files = {'file': ('report.xls', open('report.xls', 'rb'), 'application/vnd.ms-excel', {'Expires': '0'})}   >>> r = requests.post(url, files=files) >>> r.text {   ...   "files": {     "file": "<censored...binary...data>"   },   ... }   Does the filename here automatically become the $(filename) part of the key when the upload is done with the presigned URL? Or does the filename in the POST need to have the full key, including prefix, in order for the upload to work? This is something that can be determined experimentally.   Answer : yes, the filename is interpolated into the $(filename) part of the key string:   To use the file name provided by the user, use the ${filename } variable. For example, if the user Betty uploads the file lolcatz.jp g and you specify /user/Betty/${filename } , the key name is /user/Betty/lolcatz.jp g . See Object Key and Metadata - Amazon Simple Storage Servic e   Temporary credentials   The client could use temporary AWS credentials that are created by LTD Keeper and assigned to the client only when the client initiates a build upload.   The advantages of this are:   The upload itself will look similar to how it's already being done. The client does not need to manage its own credentials. The credentials could be scoped to a specific build upload (limit what prefixes can be uploaded and what S3 operations can be done).   The best way to implement temporary credentials is through Temporary Security Credentials - AWS Identity and Access Management   You can use the AWS Security Token Service (AWS STS) to create and provide trusted users with temporary security credentials that can control access to your AWS resources. Temporary security credentials work almost identically to the long-term access key credentials that your IAM users can use, with the following differences: * Temporary security credentials are short-term, as the name implies. They can be configured to last for anywhere from a few minutes to several hours. After the credentials expire, AWS no longer recognizes them or allows any kind of access from API requests made with them.   * Temporary security credentials are not stored with the user but are generated dynamically and provided to the user when requested. When (or even before) the temporary security credentials expire, the user can request new credentials, as long as the user requesting them still has permissions to do so.     These differences lead to the following advantages for using temporary credentials: * You do not have to distribute or embed long-term AWS security credentials with an application.   * You can provide access to your AWS resources to users without having to define an AWS identity for them. Temporary credentials are the basis for   roles and identity federatio n .   * The temporary security credentials have a limited lifetime, so you do not have to rotate them or explicitly revoke them when they're no longer needed. After temporary security credentials expire, they cannot be reused. You can specify how long the credentials are valid, up to a maximum limit.   The temporary credentials are used similarly to long-term credentials :   You can use temporary security credentials to make programmatic requests for AWS resources with the   AWS SDK s   or API calls, the same way that you can use long-term security credentials such as IAM user credentials. However, there are a few differences: * When you make a call using temporary security credentials, the call must include a session token, which is returned along with those temporary credentials. AWS uses the session token to validate the temporary security credentials.   * The temporary credentials expire after a specified interval. After the credentials expire, any calls that you make with those credentials will fail, so you must get a new set of credentials.   Uploading to LTD Keeper   Given that presigned POST URLs should work well, there's clearly no need to seriously consider uploading directly to the LTD Keeper application. Doing so would increase the load on LTD Keeper and probably slow down uploads.   Conclusion   Let's implement presigned POST paths. It's the cleanest solution and should further simplify the upload client by letting us use plain POST calls rather than using boto3. Since a single presigned POST policy (URL) can be generalized to work for a specific key prefix, it should be possible for LTD Keeper to created one presigned URL for a whole build. The client can use that URL for multiple uploads. The client doesn't need to inform LTD Keeper of all the keys that are being uploaded.  

              People

              • Assignee:
                jsick Jonathan Sick
                Reporter:
                jsick Jonathan Sick
                Watchers:
                Jonathan Sick
              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Summary Panel