Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-104

GitLFS as a solution to storing data files in repos

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None
    • Location:
      This ticket or SQuaRE hipchat room

      Description

      SQuaRE proposes to use the open source GitLFS protocol as a solution to storing large files in repos in general, and binary-heavy repos like awfdata in particular. We are proposing this solution after evaluation because it has a workflow very close to what people are used to already, and it allows almost normal interaction with GitHub, while storing the files on an in-house backend server so that we do not have to pay hosting fees for the data. We are happy that this is a maintainable solution and friendly to non-project contributors; while GitLFS is still a young project barely out of beta, we have noted a good rate of bugfixing and have had good interactions with the development team. It also has had good adoption beyond GitHub, from both Microsoft and Atlassian.

      As part of the evaluation, J Matt Peterson [X] deployed a GitLFS cloud-based server on the NCSA Nebula OpenStack cluster which we now invite you to try out. We have set up a test repository in the lsst organisation that is seeded with afwdata:

      https://github.com/lsst/afwdata-cowboy

      Feel free to do whatever you want with this, it will be deleted once the RFC is over. If you are in the Data Management team, you should be able to push to the repo.

      Instructions on how to interact with the repository are in the README:

      https://github.com/lsst/afwdata-cowboy/blob/master/README.md

      Note that this is a LARGE repository (intentionally) so expect some initial setup time and make sure you're on good bandwidth.

      Feel free to improve the README for particular platforms, etc. We will save it for inclusion in the real afwdata if the RFC is adopted.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            You only need read:org access with tokens

            Great!

            Show
            swinbank John Swinbank added a comment - You only need read:org access with tokens Great!
            Hide
            jmatt J Matt Peterson [X] (Inactive) added a comment -

            I'd like to make a comment on GitHub's plans to move away from the v1 API to their new batch API in the near future. The devs have been vocal about their plans to move. They haven't given an exact timeline but when it happens we'll need to implement the new API before the git-lfs client is pushed upstream to packaging tools. The last time they pushed it took about three days to hit homebrew and the devs were disappointed it took so long. I've contacted the devs and will keep the RFC updated.

            Update after four days: No update on a roadmap from the gitter channel yet.

            Show
            jmatt J Matt Peterson [X] (Inactive) added a comment - I'd like to make a comment on GitHub's plans to move away from the v1 API to their new batch API in the near future. The devs have been vocal about their plans to move. They haven't given an exact timeline but when it happens we'll need to implement the new API before the git-lfs client is pushed upstream to packaging tools. The last time they pushed it took about three days to hit homebrew and the devs were disappointed it took so long. I've contacted the devs and will keep the RFC updated. Update after four days: No update on a roadmap from the gitter channel yet.
            Hide
            frossie Frossie Economou added a comment -

            Both the Supply and the Demand side seem to have no objections, adopted.

            Show
            frossie Frossie Economou added a comment - Both the Supply and the Demand side seem to have no objections, adopted.
            Hide
            frossie Frossie Economou added a comment -

            Here's an update for where we are with the implementation: We have done some destructive testing to satisfy ourselves we have production robustness and reliability. We have identified a bunch of issues. Of the internal ones, the last ones were addressed this week. We do need the upcoming version of the GitLFS client to be out, as it resolves what we considered a serious bug where, under network interruption conditions, corrupt blobs were created without throwing an error that could be, say, detected by the CI. The fix for this is on the GitLFS client master and has been tested, so it should be a matter of days before we can roll out formally and switch afwdata.

            Show
            frossie Frossie Economou added a comment - Here's an update for where we are with the implementation: We have done some destructive testing to satisfy ourselves we have production robustness and reliability. We have identified a bunch of issues. Of the internal ones, the last ones were addressed this week. We do need the upcoming version of the GitLFS client to be out, as it resolves what we considered a serious bug where, under network interruption conditions, corrupt blobs were created without throwing an error that could be, say, detected by the CI. The fix for this is on the GitLFS client master and has been tested, so it should be a matter of days before we can roll out formally and switch afwdata.
            Hide
            frossie Frossie Economou added a comment -

            Testing Implementation status (And it is implemented)

            Show
            frossie Frossie Economou added a comment - Testing Implementation status (And it is implemented)

              People

              • Assignee:
                frossie Frossie Economou
                Reporter:
                frossie Frossie Economou
                Watchers:
                Frossie Economou, J Matt Peterson [X] (Inactive), John Swinbank, Jonathan Sick, Joshua Hoblitt, Kian-Tat Lim, Tim Jenness
              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:
                  Planned End:

                  Summary Panel