For historical reasons, our Science Pipelines codebase has long been a hold-out from the prevailing naming conventions in the broader Python world.
I think it's time to start migrating slowly towards similarity with not just the rest of the world, but the rest of DM's Python code as well. That's especially important as the boundaries between the Science Pipelines stack and the rest of DM become more fluid as we integrate the system: we will increasingly encounter cases where the appropriate naming convention for a piece of code is unclear because it's unclear whether it's part of Science Pipelines, and this RFC is in some sense an attempt to head-off repeated code review debates about that subject.
It's also true that many Science Pipelines developers have never internalized camelCase (I still frequently see reviews of code squarely in Science Pipelines written in snake_case), and frankly I'm tired of fighting it.
Finally (selfishly), I think using snake_case would be a concrete improvement for daf_butler in particular:
- To deal with different case handling in the databases we need to support, all table and column names must be lowercase (and hence it's natural for us to break words with underscores). Because we often have Python symbols that map directly to those database symbols, it's quite awkward for them to have different names just because of different naming conventions.
- astro_metadata_translator (in particular) is already defined to be outside the Science Pipelines boundary and hence uses snake_case, but is used heavily within daf_butler and in Gen3 code in obs_base. This also leads to an awkward choice between renaming a symbol to conform to camelCase naming conventions or using the same name for the same thing in violation of naming conventions.
- I have found myself subconsciously avoiding conflicts of the type described above by preferring one-word symbols. While that's ok in rare cases, in other cases it leaves symbols less descriptive than they ought to be, and I think it's fair to say that this is a problem that's hard for original authors to spot (as they know what symbols mean) and the fact that the issue is tied up with conventions makes it harder for reviewers to play a role in addressing it as well.
All that said, I strongly believe we cannot afford to convert existing stable interfaces to snake_case except in extremely rare cases, and we should not try. To paraphrase John Parejko, this is normal, and ok: Python itself (NumPy too) has a mix of snake_case and camelCase for historical reasons, and that is unlikely to ever change.
So, the full proposal:
- snake_case function, attribute, and variable names are permitted in the Science Pipelines code (with CamelCase type names and ALL_CAPS constants, as per PEP8 and our existing conventions for non-Science-Pipelines DM code).
- snake_case public names are preferred in "primarily new" code, such as new modules and classes that do not primarily implement an existing camelCase interface. snake_case local variables names are similarly preferred in new functions, even functions with a camelCase public API. The strict definition of "primarily new code" is case by case and left to developer judgement.
- camelCase is still permitted in new code, and is preferred when adding public methods, arguments, and attributes to existing classes with an established camelCase API, or making modifications to existing functions with camelCase local variables.
- Changing existing APIs from camelCase to snake_case in general is not covered by this RFC; such changes are no different from other API changes, and hence must be RFC'd on their own, approved by the CCB, and (unless the CCB declares otherwise) go through a deprecation period. I would personally be unhappy to see even much reviewer time spent on such changes, even if the changes themselves were made in free/science time, but I don't think that's something an RFC proposal can really weigh in on.
- Under no circumstances should the Python side of a pybind11-wrapped C++ function use a different naming convention than the C++ function. C++ naming conventions and the guidelines for applying them in new/old code are no different from those of Python, but it is never permitted to just change things at the pybind11 level; either both C++ and Python or neither should be changed.
- This RFC also proposes that daf_butler, downstream implementations of interfaces defined in daf_butler (in e.g. obs packages), and the graph-generation and execution code for PipelineTask be converted to snake_case. As the Gen3 middleware has explicitly avoided any stability guarantees (and is still regularly changing in ways that are much more significant than naming) and there are specific benefits to converting daf_butler (as described above), I think this is worth doing. That does not extend to the public interfaces of PipelineTask itself, which I consider too tied to an important existing stable interface (Task) to be worth changing, though I'm willing to debate that point in the RFC.