Location:Reply to this ticket
Purpose: Adopt the most lightweight (from the point of maintenance) copyright process for software written in construction that is still compatible with our contractual obligations and open source principles.
Proposal for default practice:
1. Each file has a header that says “See COPYRIGHT file at the top of the source tree”.
2. The COPYRIGHT file is considered a template file, with sections of it replaceable by robots.
3. The copyright file has a line per institution that contributed to the code, in a date range eg.
Copyright University of Waterloo (2012-2015)
4. If people from two institutions are making substantial contributions to that code, they add their institution to the copyright line.
Copyright University of Waterloo and AURA/LSST (2012-2015)
5. Additional boilerplate will be included in the COPYRIGHT file reflecting the AURA/LSST-institution contractual arrangements (specifically perpetual license to AURA/LSST to modify and redistribute)
6. Requirement of developers: Use your institutional email address for commits
7. Requirement on SQuaRE: Insert template into repos. Periodically update the end date of the notice and run a simple check to make sure the list of intitutions is consistent (eg a file has a UW line if all the commits are from people with UW addresses). Scan for non-institutional emails in the commits (eg. people pushing with their gmail address).
8. This is in the default process for "normal" work. If someone is developing code that they are worried is of commercial value or other concerns that require a more defensive process, they are free to engage in more heavyweight processes such as including copyright statements in every file. They would undertake to maintain that non-default process.
Note that the construction contracts do not require copyright assignment to AURA/LSST and we will not require copyright assignment from open source contributors.
#6 is the most significant as VCS information can legally be used to resolve disputed claims.
Background (skip if you’re fine with the above):
- Many conventions on copyright in open source come from FSF guidance for GPL-license source but that has been drawn for specific situations that are not a particular worry to us (e.g. commercial parties subverting open source code).
- Once again we are guided by the Software Freedom Law Center
Copyright is implicit (it does not need to be asserted). A central Copyright file notice is therefore sufficient in cases where it is unlikely that a file can be separated form its source tree. The Version Control system is considered adequate proof of individual contributions.
- is triggering
DM-4220 Convert copyright/license statements to one-liners for RFC-45
DM-7042 validate_base API refinement
DM-3487 Implement RFC-45 Copyright File
DM-5383 Update developer docs with Copyright instructions
DM-5382 Update template repository with new copyright rules
- relates to
DM-5031 Enable external code contributions
DM-13565 Put correct copyright/license headers in all jointcal files
DM-13966 Research why license is not detected for daf_butler
DM-4535 Execute stack copyright/license conversion
- To Do
DM-593 Update all DM Software Copyright and License Agreement notices to reflect AURA/LSST
DM-13599 Update copyright info following RFC-45
RFC-908 GPL (and a lot of other licenses) doesn't require one to include long GPL header in every source file
I really like the proposal. John raises a good point, but I prefer one bloated file that we can usually ignored to having a large block of boilerplate in every source file.
If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. It doesn't matter if that gets long as the revision control should enable tracking. Regarding emails, it's important that the author field be traceable to the COPYRIGHT file. If the work was done by Joe Blogs <firstname.lastname@example.org> and the copyright added is Copyright 2015 J. Blogs then that is fine. What is a problem is if J. Blogs actually works for Big University in California and there is no way to link the statement in the copyright file to the author information. It is possible to fix author information before merging.
They do have to agree to the license terms though. I worry that the "perpetual license to AURA/LSST to do whatever they want, including changing the license" is not compatible with GPLv3 and open source in general, but I am not a lawyer. That language works for partner institutions but other places will only be agreeing to GPLv3. If the person working on HSC code does not agree with the LSST license terms then that code could not be merged.
If you merge code back from non-partner institutions then, yes, a note should be added to the COPYRIGHT file indicating that contribution. [...]
Silly question, but: is that a definitive answer, or is it a suggestion?
I could imagine a situation in which LSST/AURA decided they would not be happy relying on code to which they or a partner institution did not hold copyright. I could equally imagine a policy insisting that non-partners explicitly assign copyright, or even that contributions from outside the project were not allowed (I don't fully understand the issues, but there's already some concern around using external work for construction). I'm not sure if Tim Jenness is telling me that there's definitely no problem here, or just that he's suggesting a possible approach.
I was not indicating policy as, mentioned later in my response, I'm not sure how external code meshes with the perpetual AURA/LSST license clause. In general you can't change the license without agreement from all copyright holders. Frossie Economou is checking up on the license text in the partner contracts but none of that matters for code written by non-partners. If you accept code from third parties and they retain copyright then you would need to get their permission to change to BSD or MIT license later on. If you merge in HSC code that has been substantially modified by a Subaru employee then that is clearly an important test case to deal with.
So, if we accepted code from third parties, yes the COPYRIGHT file should be updated to reflect that. I don't think you are going to get much traction from external contributors if you insist on copyright assignment (but note, that's exactly what the Free Software Foundation insist on if you contribute to FSF projects).
|Copyright is implicit (it does not need to be asserted)|
Berne convention etc. So why do we need to include a copyright at all? If the VCS is sufficient proof of authorship can't we just get a copyright assignment from each individual?
I take Tim and other's concerns about copyright assignment, but isn't that distinct from adding a statement to the files we ship?
Yes, copyright is implicit but that doesn't help clarity. Python is funny of course because we ship the source code in a "binary" distribution but we won't be shipping the COPYRIGHT file. The source can then become disconnected from the repo and the associated copyright information.
|Yes, copyright is implicit but that doesn't help clarity.|
I agree. But in that case can't we use a simple file that explains our copyright and doesn't need to be updated with dates and institutions?
To answer Robert's question as to why name institutions at all, here are some reasons you'd want to:
1. While in a court of law we do not have to assert copyright explicitly, in practice I would like to have a Copyright statement that includes the language included in the contract (
("the authors grant perpetual irrevocable license to AURA/LSST") so that if someone like Debian challenges me about whether I have the right to release the code under a particular license, I can say "yes because I have this permission noted from the named copyright holders". Not for legal reasons, but for "don't waste my time" reasons.
2. There are also situations and licenses that require people to get the explicit permission of the copyright holder for something so it's nice to have that recorded. Granted I can't think of any that apply to the code and the licenses we are planning on using, but let's just do this once, eh.
3. There are people who are mandated by their institutions to explicitly assert copyright (I have not checked specifically ours, but the University of California does).
4. In the bizzare situation where somebody issues a DMCA takedown notice, it would allow us to be contacted and respond.
#3 is the killer. Otherwise there would be nothing stopping us from doing like many projects and write "Copyright LSST-DM developers". Unfortunately, we the developers do not own the copyright in this case.
Russell's question: A significant contributor (minor changes are not sufficient to affect copyright) would enter their institution's name or their personal name (as per their own requirements) in the COPYRIGHT file. I believe asking people to do that (and thus signing onto the contractual language granting AURA/LSST rights) is far less obnoxious than asking for copyright assignment.
I would like to separate this simple matter from the more complex licensing matter. Right now the aim is to do something that does not violate our institutional agreements and makes things as easy as possible for developers.
If the Software Freedom Law Center opines that the contractual language does not grant me right to license, relicense, or multi-license the code, we have bigger problems anyway and we (=Jeff) have to go back and fix them. We don't want to wait for that though since we are already working on new code. If they decide AURA/LSST have to own the copyright, it's back to square one.
Here is an example of a multi-institutional copyright notice:
Regarding "use of institutional email for commits", this isn't currently happening in all cases: e.g. https://github.com/lsst/afw/commit/eb85b0cc17e4fded187315a428fddf4868687d2c where Bob Armstrong is credited with the commit via a gmail address; also https://github.com/lsst/afw/commit/f18c62333948579a4c2bbb8581248bc186b5c1dc where Scott Daniel is using gmail account (and not declaring his name in the commit author field either; just a user id). One of the problems with Github is that it's not easy to spot when a personal address is used because Github collates all email addresses and doesn't make them visible to the UI. This means that issues like this are not easily spotted in the code review phase.
We will have a COPYRIGHT file per-repo with auto-updated dates and institutions or people listed; in source files, an optional pointer to the COPYRIGHT file can be included.
While this has been adopted, it seems like we're missing some information in terms of actually being able to carry it out in new code (notably, the lack of a template COPYRIGHT file). If there's an issue for that, could we link it here?
In any case, I'm assuming we're not quite supposed to be applying this procedure just yet; please let me know if that's incorrect.
Regarding the "AURA/LSST" copyright stated above, is "AURA/LSST" an actual legal entity? I note that Gemini and STScI use:
Copyright(c) 2015 Association of Universities for Research in Astronomy, Inc.
and not "AURA/Gemini" and "AURA/STScI".
I'm making a new package at the moment, so naturally I'd like to implement the new style of copyright management adopted in this RFC. This exercise will also help me document (and therefore implement, to some extent) this RFC with revised templates.
I'm fine with the COPYRIGHT file and the in-source comment boilerplate. I'm stumbling on the LICENSE file, though, which the RFC doesn't entirely spell out.
The problem is that our canonical LICENSE template (https://www.lsstcorp.org/LegalNotices/LsstLicenseStatement.txt) isn't GPLv3, but rather is a preamble + GPLv3 (plus licenses for all third party repos).
The preamble includes a copyright statement:
LSST Data Management System Software
Copyright 2008-2014 AURA/LSST.
Clearly this works against the consolidation that this RFC is try to achieve. Can we eliminate the copyright line here? Or alternatively, change it to read:
LSST Data Management System Software
See COPYRIGHT file.
Next, I noticed that the official license statement includes licenses for all our third-party dependencies. Surely we can eliminate this from LICENSE statements for individual stack packages, right?
Finally, I question the utility of putting any sort of custom preamble in a LICENSE file.
Specifically, notice that GitHub 'knows' (displays on the repo's landing page) that the ltd-keeper project is MIT-licensed because it's LICENSE file is the MIT license, verbatim. On the other hand, GitHub doesn't know the license of our Stack packages (e.g. validate_drp) because we modify the GPLv3 to such a large extent with our preamble.
A lot of the information being put in the standard license actually belongs in either the COPYRIGHT (who contributed to the code) or the README (what the code is) rather than in the LICENSE.
Thus I think that, ideally, the LICENSE file in Stack packages should just be the verbatim GPLv3 license text.
Is there any objection to writing the new generation of LICENSE/COPYRIGHT/README files this way?
See this recent GitHub blog post for background on GitHub’s license metadata service: https://github.com/blog/2252-license-now-displayed-on-repository-overview
For a look at my implementation of
RFC-45 in a Stack package, see https://github.com/lsst/validate_base/tree/tickets/DM-7692
I can't see a comment on this above, but FSF really do want their boiler plate in each source file and not just a one liner to look in a different file. See https://www.gnu.org/licenses/gpl-howto.html
I think if this RFC ended up with
- Add the GPL copying permission statement
- A reference to a COPYRIGHT file for the copyright information
Then it would seem that we could proceed without having to ask a lawyer. Removing the copying permission statement might be unwise.
BSD is different of course and that leads to a discussion of
Thanks Tim Jenness, I think your comment clarifies an implementation strategy sufficiently that I feel confident enough in being able to take on
DM-5383 and DM-5382. We can move further discussion of implementation details to DM-5383, in particular.
Brian Van Klaveren, this RFC is mainly meant to be discussing how to manage copyright and license in our code, not a discussion over which license we want to be using. The basic ideas of:
- small stub in each source file
- License in one file.
- Copyright in one file, institutional rather than "LSST".
- Use of proper AURA name in AURA copyrights.
- Committing with corporate email address.
are all good practice regardless of the specific license choice.
What are the implications for code which we (want to) merge to the stack but is not written by an LSST partner institution? For example, if somebody not working for Princeton had made changes to the HSC stack, and we want to merge them back into LSST. At the least, it will be hard to enforce rules about e-mail addresses on them; in the worst case, the list of institutions claiming copyright on parts of the LSST stack could balloon.