Uploaded image for project: 'Request For Comments'
  1. Request For Comments
  2. RFC-45

Process for maintaining Copyright information in DM source code

    XMLWordPrintable

    Details

    • Type: RFC
    • Status: Implemented
    • Resolution: Done
    • Component/s: DM
    • Labels:
      None
    • Location:
      Reply to this ticket

      Description

      Purpose: Adopt the most lightweight (from the point of maintenance) copyright process for software written in construction that is still compatible with our contractual obligations and open source principles.

      Proposal for default practice:

      1. Each file has a header that says “See COPYRIGHT file at the top of the source tree”.

      2. The COPYRIGHT file is considered a template file, with sections of it replaceable by robots.

      3. The copyright file has a line per institution that contributed to the code, in a date range eg.

      Copyright University of Waterloo (2012-2015)

      4. If people from two institutions are making substantial contributions to that code, they add their institution to the copyright line.

      Copyright University of Waterloo and AURA/LSST (2012-2015)

      5. Additional boilerplate will be included in the COPYRIGHT file reflecting the AURA/LSST-institution contractual arrangements (specifically perpetual license to AURA/LSST to modify and redistribute)

      6. Requirement of developers: Use your institutional email address for commits

      7. Requirement on SQuaRE: Insert template into repos. Periodically update the end date of the notice and run a simple check to make sure the list of intitutions is consistent (eg a file has a UW line if all the commits are from people with UW addresses). Scan for non-institutional emails in the commits (eg. people pushing with their gmail address).

      8. This is in the default process for "normal" work. If someone is developing code that they are worried is of commercial value or other concerns that require a more defensive process, they are free to engage in more heavyweight processes such as including copyright statements in every file. They would undertake to maintain that non-default process.
      Note that the construction contracts do not require copyright assignment to AURA/LSST and we will not require copyright assignment from open source contributors.

      #6 is the most significant as VCS information can legally be used to resolve disputed claims.

      Background (skip if you’re fine with the above):

      • Many conventions on copyright in open source come from FSF guidance for GPL-license source but that has been drawn for specific situations that are not a particular worry to us (e.g. commercial parties subverting open source code).
      • Once again we are guided by the Software Freedom Law Center

      https://www.softwarefreedom.org/resources/2012/ManagingCopyrightInformation.html

      Summary:

      Copyright is implicit (it does not need to be asserted). A central Copyright file notice is therefore sufficient in cases where it is unlikely that a file can be separated form its source tree. The Version Control system is considered adequate proof of individual contributions.

        Attachments

          Issue Links

            Activity

            Hide
            swinbank John Swinbank added a comment -

            What are the implications for code which we (want to) merge to the stack but is not written by an LSST partner institution? For example, if somebody not working for Princeton had made changes to the HSC stack, and we want to merge them back into LSST. At the least, it will be hard to enforce rules about e-mail addresses on them; in the worst case, the list of institutions claiming copyright on parts of the LSST stack could balloon.

            Show
            swinbank John Swinbank added a comment - What are the implications for code which we (want to) merge to the stack but is not written by an LSST partner institution? For example, if somebody not working for Princeton had made changes to the HSC stack, and we want to merge them back into LSST. At the least, it will be hard to enforce rules about e-mail addresses on them; in the worst case, the list of institutions claiming copyright on parts of the LSST stack could balloon.
            Hide
            rowen Russell Owen added a comment -

            I really like the proposal. John raises a good point, but I prefer one bloated file that we can usually ignored to having a large block of boilerplate in every source file.

            Show
            rowen Russell Owen added a comment - I really like the proposal. John raises a good point, but I prefer one bloated file that we can usually ignored to having a large block of boilerplate in every source file.
            Hide
            tjenness Tim Jenness added a comment -

            If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. It doesn't matter if that gets long as the revision control should enable tracking. Regarding emails, it's important that the author field be traceable to the COPYRIGHT file. If the work was done by Joe Blogs <blogs@pobox.net> and the copyright added is Copyright 2015 J. Blogs then that is fine. What is a problem is if J. Blogs actually works for Big University in California and there is no way to link the statement in the copyright file to the author information. It is possible to fix author information before merging.

            They do have to agree to the license terms though. I worry that the "perpetual license to AURA/LSST to do whatever they want, including changing the license" is not compatible with GPLv3 and open source in general, but I am not a lawyer. That language works for partner institutions but other places will only be agreeing to GPLv3. If the person working on HSC code does not agree with the LSST license terms then that code could not be merged.

            Show
            tjenness Tim Jenness added a comment - If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. It doesn't matter if that gets long as the revision control should enable tracking. Regarding emails, it's important that the author field be traceable to the COPYRIGHT file. If the work was done by Joe Blogs <blogs@pobox.net> and the copyright added is Copyright 2015 J. Blogs then that is fine. What is a problem is if J. Blogs actually works for Big University in California and there is no way to link the statement in the copyright file to the author information. It is possible to fix author information before merging. They do have to agree to the license terms though. I worry that the "perpetual license to AURA/LSST to do whatever they want, including changing the license" is not compatible with GPLv3 and open source in general, but I am not a lawyer. That language works for partner institutions but other places will only be agreeing to GPLv3. If the person working on HSC code does not agree with the LSST license terms then that code could not be merged.
            Hide
            swinbank John Swinbank added a comment -

            If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. [...]

            Silly question, but: is that a definitive answer, or is it a suggestion?

            I could imagine a situation in which LSST/AURA decided they would not be happy relying on code to which they or a partner institution did not hold copyright. I could equally imagine a policy insisting that non-partners explicitly assign copyright, or even that contributions from outside the project were not allowed (I don't fully understand the issues, but there's already some concern around using external work for construction). I'm not sure if Tim Jenness is telling me that there's definitely no problem here, or just that he's suggesting a possible approach.

            Show
            swinbank John Swinbank added a comment - If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. [...] Silly question, but: is that a definitive answer, or is it a suggestion? I could imagine a situation in which LSST/AURA decided they would not be happy relying on code to which they or a partner institution did not hold copyright. I could equally imagine a policy insisting that non-partners explicitly assign copyright, or even that contributions from outside the project were not allowed (I don't fully understand the issues, but there's already some concern around using external work for construction). I'm not sure if Tim Jenness is telling me that there's definitely no problem here, or just that he's suggesting a possible approach.
            Hide
            tjenness Tim Jenness added a comment -

            I was not indicating policy as, mentioned later in my response, I'm not sure how external code meshes with the perpetual AURA/LSST license clause. In general you can't change the license without agreement from all copyright holders. Frossie Economou is checking up on the license text in the partner contracts but none of that matters for code written by non-partners. If you accept code from third parties and they retain copyright then you would need to get their permission to change to BSD or MIT license later on. If you merge in HSC code that has been substantially modified by a Subaru employee then that is clearly an important test case to deal with.

            So, if we accepted code from third parties, yes the COPYRIGHT file should be updated to reflect that. I don't think you are going to get much traction from external contributors if you insist on copyright assignment (but note, that's exactly what the Free Software Foundation insist on if you contribute to FSF projects).

            Show
            tjenness Tim Jenness added a comment - I was not indicating policy as, mentioned later in my response, I'm not sure how external code meshes with the perpetual AURA/LSST license clause. In general you can't change the license without agreement from all copyright holders. Frossie Economou is checking up on the license text in the partner contracts but none of that matters for code written by non-partners. If you accept code from third parties and they retain copyright then you would need to get their permission to change to BSD or MIT license later on. If you merge in HSC code that has been substantially modified by a Subaru employee then that is clearly an important test case to deal with. So, if we accepted code from third parties, yes the COPYRIGHT file should be updated to reflect that. I don't think you are going to get much traction from external contributors if you insist on copyright assignment (but note, that's exactly what the Free Software Foundation insist on if you contribute to FSF projects).
            Hide
            rhl Robert Lupton added a comment -
            Copyright is implicit (it does not need to be asserted)

            Berne convention etc. So why do we need to include a copyright at all? If the VCS is sufficient proof of authorship can't we just get a copyright assignment from each individual?

            I take Tim and other's concerns about copyright assignment, but isn't that distinct from adding a statement to the files we ship?

            Show
            rhl Robert Lupton added a comment - Copyright is implicit (it does not need to be asserted) Berne convention etc. So why do we need to include a copyright at all? If the VCS is sufficient proof of authorship can't we just get a copyright assignment from each individual? I take Tim and other's concerns about copyright assignment, but isn't that distinct from adding a statement to the files we ship?
            Hide
            tjenness Tim Jenness added a comment -

            Yes, copyright is implicit but that doesn't help clarity. Python is funny of course because we ship the source code in a "binary" distribution but we won't be shipping the COPYRIGHT file. The source can then become disconnected from the repo and the associated copyright information.

            Show
            tjenness Tim Jenness added a comment - Yes, copyright is implicit but that doesn't help clarity. Python is funny of course because we ship the source code in a "binary" distribution but we won't be shipping the COPYRIGHT file. The source can then become disconnected from the repo and the associated copyright information.
            Hide
            rhl Robert Lupton added a comment -
            Yes, copyright is implicit but that doesn't help clarity.

            I agree. But in that case can't we use a simple file that explains our copyright and doesn't need to be updated with dates and institutions?

            Show
            rhl Robert Lupton added a comment - Yes, copyright is implicit but that doesn't help clarity. I agree. But in that case can't we use a simple file that explains our copyright and doesn't need to be updated with dates and institutions?
            Hide
            frossie Frossie Economou added a comment - - edited

            To answer Robert's question as to why name institutions at all, here are some reasons you'd want to:

            1. While in a court of law we do not have to assert copyright explicitly, in practice I would like to have a Copyright statement that includes the language included in the contract (
            ("the authors grant perpetual irrevocable license to AURA/LSST") so that if someone like Debian challenges me about whether I have the right to release the code under a particular license, I can say "yes because I have this permission noted from the named copyright holders". Not for legal reasons, but for "don't waste my time" reasons.

            2. There are also situations and licenses that require people to get the explicit permission of the copyright holder for something so it's nice to have that recorded. Granted I can't think of any that apply to the code and the licenses we are planning on using, but let's just do this once, eh.

            3. There are people who are mandated by their institutions to explicitly assert copyright (I have not checked specifically ours, but the University of California does).

            4. In the bizzare situation where somebody issues a DMCA takedown notice, it would allow us to be contacted and respond.

            #3 is the killer. Otherwise there would be nothing stopping us from doing like many projects and write "Copyright LSST-DM developers". Unfortunately, we the developers do not own the copyright in this case.

            Russell's question: A significant contributor (minor changes are not sufficient to affect copyright) would enter their institution's name or their personal name (as per their own requirements) in the COPYRIGHT file. I believe asking people to do that (and thus signing onto the contractual language granting AURA/LSST rights) is far less obnoxious than asking for copyright assignment.

            I would like to separate this simple matter from the more complex licensing matter. Right now the aim is to do something that does not violate our institutional agreements and makes things as easy as possible for developers.

            If the Software Freedom Law Center opines that the contractual language does not grant me right to license, relicense, or multi-license the code, we have bigger problems anyway and we (=Jeff) have to go back and fix them. We don't want to wait for that though since we are already working on new code. If they decide AURA/LSST have to own the copyright, it's back to square one.

            Here is an example of a multi-institutional copyright notice:

            http://www.w3.org/Consortium/Legal/2002/copyright-software-short-notice-20021231.html

            Show
            frossie Frossie Economou added a comment - - edited To answer Robert's question as to why name institutions at all, here are some reasons you'd want to: 1. While in a court of law we do not have to assert copyright explicitly, in practice I would like to have a Copyright statement that includes the language included in the contract ( ("the authors grant perpetual irrevocable license to AURA/LSST") so that if someone like Debian challenges me about whether I have the right to release the code under a particular license, I can say "yes because I have this permission noted from the named copyright holders". Not for legal reasons, but for "don't waste my time" reasons. 2. There are also situations and licenses that require people to get the explicit permission of the copyright holder for something so it's nice to have that recorded. Granted I can't think of any that apply to the code and the licenses we are planning on using, but let's just do this once, eh. 3. There are people who are mandated by their institutions to explicitly assert copyright (I have not checked specifically ours, but the University of California does). 4. In the bizzare situation where somebody issues a DMCA takedown notice, it would allow us to be contacted and respond. #3 is the killer. Otherwise there would be nothing stopping us from doing like many projects and write "Copyright LSST-DM developers". Unfortunately, we the developers do not own the copyright in this case. Russell's question: A significant contributor (minor changes are not sufficient to affect copyright) would enter their institution's name or their personal name (as per their own requirements) in the COPYRIGHT file. I believe asking people to do that (and thus signing onto the contractual language granting AURA/LSST rights) is far less obnoxious than asking for copyright assignment. I would like to separate this simple matter from the more complex licensing matter. Right now the aim is to do something that does not violate our institutional agreements and makes things as easy as possible for developers. If the Software Freedom Law Center opines that the contractual language does not grant me right to license, relicense, or multi-license the code, we have bigger problems anyway and we (=Jeff) have to go back and fix them. We don't want to wait for that though since we are already working on new code. If they decide AURA/LSST have to own the copyright, it's back to square one. Here is an example of a multi-institutional copyright notice: http://www.w3.org/Consortium/Legal/2002/copyright-software-short-notice-20021231.html
            Hide
            tjenness Tim Jenness added a comment - - edited

            Regarding "use of institutional email for commits", this isn't currently happening in all cases: e.g. https://github.com/lsst/afw/commit/eb85b0cc17e4fded187315a428fddf4868687d2c where Bob Armstrong is credited with the commit via a gmail address; also https://github.com/lsst/afw/commit/f18c62333948579a4c2bbb8581248bc186b5c1dc where Scott Daniel is using gmail account (and not declaring his name in the commit author field either; just a user id). One of the problems with Github is that it's not easy to spot when a personal address is used because Github collates all email addresses and doesn't make them visible to the UI. This means that issues like this are not easily spotted in the code review phase.

            Show
            tjenness Tim Jenness added a comment - - edited Regarding "use of institutional email for commits", this isn't currently happening in all cases: e.g. https://github.com/lsst/afw/commit/eb85b0cc17e4fded187315a428fddf4868687d2c where Bob Armstrong is credited with the commit via a gmail address; also https://github.com/lsst/afw/commit/f18c62333948579a4c2bbb8581248bc186b5c1dc where Scott Daniel is using gmail account (and not declaring his name in the commit author field either; just a user id). One of the problems with Github is that it's not easy to spot when a personal address is used because Github collates all email addresses and doesn't make them visible to the UI. This means that issues like this are not easily spotted in the code review phase.
            Hide
            ktl Kian-Tat Lim added a comment -

            We will have a COPYRIGHT file per-repo with auto-updated dates and institutions or people listed; in source files, an optional pointer to the COPYRIGHT file can be included.

            Show
            ktl Kian-Tat Lim added a comment - We will have a COPYRIGHT file per-repo with auto-updated dates and institutions or people listed; in source files, an optional pointer to the COPYRIGHT file can be included.
            Hide
            jbosch Jim Bosch added a comment -

            While this has been adopted, it seems like we're missing some information in terms of actually being able to carry it out in new code (notably, the lack of a template COPYRIGHT file). If there's an issue for that, could we link it here?

            In any case, I'm assuming we're not quite supposed to be applying this procedure just yet; please let me know if that's incorrect.

            Show
            jbosch Jim Bosch added a comment - While this has been adopted, it seems like we're missing some information in terms of actually being able to carry it out in new code (notably, the lack of a template COPYRIGHT file). If there's an issue for that, could we link it here? In any case, I'm assuming we're not quite supposed to be applying this procedure just yet; please let me know if that's incorrect.
            Hide
            tjenness Tim Jenness added a comment -

            Regarding the "AURA/LSST" copyright stated above, is "AURA/LSST" an actual legal entity? I note that Gemini and STScI use:

            Copyright(c) 2015 Association of Universities for Research in Astronomy, Inc.
            

            and not "AURA/Gemini" and "AURA/STScI".

            Show
            tjenness Tim Jenness added a comment - Regarding the "AURA/LSST" copyright stated above, is "AURA/LSST" an actual legal entity? I note that Gemini and STScI use: Copyright(c) 2015 Association of Universities for Research in Astronomy, Inc. and not "AURA/Gemini" and "AURA/STScI".
            Hide
            jsick Jonathan Sick added a comment -

            I'm making a new package at the moment, so naturally I'd like to implement the new style of copyright management adopted in this RFC. This exercise will also help me document (and therefore implement, to some extent) this RFC with revised templates.

            I'm fine with the COPYRIGHT file and the in-source comment boilerplate. I'm stumbling on the LICENSE file, though, which the RFC doesn't entirely spell out.

            The problem is that our canonical LICENSE template (https://www.lsstcorp.org/LegalNotices/LsstLicenseStatement.txt) isn't GPLv3, but rather is a preamble + GPLv3 (plus licenses for all third party repos).

            The preamble includes a copyright statement:

            LSST Data Management System Software
            Copyright 2008-2014 AURA/LSST.
            

            Clearly this works against the consolidation that this RFC is try to achieve. Can we eliminate the copyright line here? Or alternatively, change it to read:

            LSST Data Management System Software
            See COPYRIGHT file.
            

            Next, I noticed that the official license statement includes licenses for all our third-party dependencies. Surely we can eliminate this from LICENSE statements for individual stack packages, right?

            Finally, I question the utility of putting any sort of custom preamble in a LICENSE file.

            Specifically, notice that GitHub 'knows' (displays on the repo's landing page) that the ltd-keeper project is MIT-licensed because it's LICENSE file is the MIT license, verbatim. On the other hand, GitHub doesn't know the license of our Stack packages (e.g. validate_drp) because we modify the GPLv3 to such a large extent with our preamble.

            A lot of the information being put in the standard license actually belongs in either the COPYRIGHT (who contributed to the code) or the README (what the code is) rather than in the LICENSE.

            Thus I think that, ideally, the LICENSE file in Stack packages should just be the verbatim GPLv3 license text.

            Is there any objection to writing the new generation of LICENSE/COPYRIGHT/README files this way?

            Show
            jsick Jonathan Sick added a comment - I'm making a new package at the moment, so naturally I'd like to implement the new style of copyright management adopted in this RFC. This exercise will also help me document (and therefore implement, to some extent) this RFC with revised templates. I'm fine with the COPYRIGHT file and the in-source comment boilerplate. I'm stumbling on the LICENSE file, though, which the RFC doesn't entirely spell out. The problem is that our canonical LICENSE template ( https://www.lsstcorp.org/LegalNotices/LsstLicenseStatement.txt ) isn't GPLv3, but rather is a preamble + GPLv3 (plus licenses for all third party repos). The preamble includes a copyright statement: LSST Data Management System Software Copyright 2008-2014 AURA/LSST. Clearly this works against the consolidation that this RFC is try to achieve. Can we eliminate the copyright line here? Or alternatively, change it to read: LSST Data Management System Software See COPYRIGHT file. Next, I noticed that the official license statement includes licenses for all our third-party dependencies. Surely we can eliminate this from LICENSE statements for individual stack packages, right? Finally, I question the utility of putting any sort of custom preamble in a LICENSE file. Specifically, notice that GitHub 'knows' (displays on the repo's landing page) that the ltd-keeper project is MIT-licensed because it's LICENSE file is the MIT license, verbatim. On the other hand, GitHub doesn't know the license of our Stack packages (e.g. validate_drp ) because we modify the GPLv3 to such a large extent with our preamble. A lot of the information being put in the standard license actually belongs in either the COPYRIGHT (who contributed to the code) or the README (what the code is) rather than in the LICENSE. Thus I think that, ideally, the LICENSE file in Stack packages should just be the verbatim GPLv3 license text. Is there any objection to writing the new generation of LICENSE/COPYRIGHT/README files this way?
            Hide
            jsick Jonathan Sick added a comment -

            See this recent GitHub blog post for background on GitHub’s license metadata service: https://github.com/blog/2252-license-now-displayed-on-repository-overview

            For a look at my implementation of RFC-45 in a Stack package, see https://github.com/lsst/validate_base/tree/tickets/DM-7692

            Show
            jsick Jonathan Sick added a comment - See this recent GitHub blog post for background on GitHub’s license metadata service: https://github.com/blog/2252-license-now-displayed-on-repository-overview For a look at my implementation of RFC-45 in a Stack package, see https://github.com/lsst/validate_base/tree/tickets/DM-7692
            Hide
            tjenness Tim Jenness added a comment -

            I can't see a comment on this above, but FSF really do want their boiler plate in each source file and not just a one liner to look in a different file. See https://www.gnu.org/licenses/gpl-howto.html
            I think if this RFC ended up with

            • Add the GPL copying permission statement
            • A reference to a COPYRIGHT file for the copyright information

            Then it would seem that we could proceed without having to ask a lawyer. Removing the copying permission statement might be unwise.

            BSD is different of course and that leads to a discussion of DM-5031.

            Show
            tjenness Tim Jenness added a comment - I can't see a comment on this above, but FSF really do want their boiler plate in each source file and not just a one liner to look in a different file. See https://www.gnu.org/licenses/gpl-howto.html I think if this RFC ended up with Add the GPL copying permission statement A reference to a COPYRIGHT file for the copyright information Then it would seem that we could proceed without having to ask a lawyer. Removing the copying permission statement might be unwise. BSD is different of course and that leads to a discussion of DM-5031 .
            Hide
            jsick Jonathan Sick added a comment -

            Thanks Tim Jenness, I think your comment clarifies an implementation strategy sufficiently that I feel confident enough in being able to take on DM-5383 and DM-5382. We can move further discussion of implementation details to DM-5383, in particular.

            Show
            jsick Jonathan Sick added a comment - Thanks Tim Jenness , I think your comment clarifies an implementation strategy sufficiently that I feel confident enough in being able to take on DM-5383 and DM-5382 . We can move further discussion of implementation details to DM-5383 , in particular.
            Hide
            ktl Kian-Tat Lim added a comment -

            OK, let's move forward.

            Show
            ktl Kian-Tat Lim added a comment - OK, let's move forward.
            Hide
            tjenness Tim Jenness added a comment -

            Brian Van Klaveren, this RFC is mainly meant to be discussing how to manage copyright and license in our code, not a discussion over which license we want to be using. The basic ideas of:

            • small stub in each source file
            • License in one file.
            • Copyright in one file, institutional rather than "LSST".
            • Use of proper AURA name in AURA copyrights.
            • Committing with corporate email address.

            are all good practice regardless of the specific license choice.

            Show
            tjenness Tim Jenness added a comment - Brian Van Klaveren , this RFC is mainly meant to be discussing how to manage copyright and license in our code, not a discussion over which license we want to be using. The basic ideas of: small stub in each source file License in one file. Copyright in one file, institutional rather than "LSST". Use of proper AURA name in AURA copyrights. Committing with corporate email address. are all good practice regardless of the specific license choice.

              People

              Assignee:
              frossie Frossie Economou
              Reporter:
              frossie Frossie Economou
              Watchers:
              Brian Van Klaveren, Frossie Economou, Jeff Kantor, Jim Bosch, John Parejko, John Swinbank, Jonathan Sick, Kian-Tat Lim, Robert Lupton, Russell Owen, Tim Jenness, Wil O'Mullane, Xiuqin Wu [X] (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:
                Planned End:

                  Jenkins

                  No builds found.