# A standard pattern for class attributes as interfaces

XMLWordPrintable

#### Details

• Type: RFC
• Resolution: Unresolved
• Component/s:
• Labels:
None

#### Description

It is a relatively common problem in Python to define a base-class interface that demands that all derived classes provide a class attribute with a particular name, e.g.:

 class BaseAlgorithm(ABC):  ConfigClass = None  """Type of the configuration object for this algorithm.    Must be set by all derived classes to a subclass of  lsst.pex.config.Config.  """

The approach above is ubiquitous in our codebase (especially in Task), but I regard it as a serious antipattern, because it is all too easy for derived classes to ignore this instruction, especially if the base class initializes the attribute to a value of the appropriate type as a sort of attempt at documentation (rather than a viable default), yielding silently incorrect behavior (I have seen this in the wild, in our codebase, with the problem going unnoticed for months).

A slightly better approach is to leave the attribute undefined in the base class, which can be done by using a type annotation instead:

 class BaseAlgorithm(ABC):  ConfigClass: typing.Type[lsst.pex.config.Config]  """Type of the configuration object for this algorithm.    Must be set by all derived classes to a subclass of  lsst.pex.config.Config.  """

This will cause any attempt to access the attribute to fail, as it should, and the type annotation clarifies the type on its own.  But it'd be nice if we could get a failure earlier.  We can move it up to the point that the derived class is constructed by using the abc module an a classmethod, instead of an attribute:

 class BaseAlgorithm(ABC):  @classmethod  @abstractmethod  def getConfigClass(cls):  """Return the type of the configuration object for this algorithm."""  raise NotImplementedError()

This is my preferred solution to the problem, because:

• It is the built-in approach to defining abstract interfaces in Python.
• It is immediately obvious to any reader familiar with the abc module (and all of our developers should be, I think) what is expected of any derived class, without any documentation.  (I am not by any means suggesting that documentation could or should be skipped in this case - but it is all too easy for the author of a class to unknowingly provide incomplete or vague documentation, especially for derived-class implementers, and in-code declaration of an abstract method is a strong guard against that.)
• The abc module provides features for edge cases like, "I want to provide a default, but want to require derived classes to opt-in" (i.e. an abstractmethod with an implementation that's called with super).
• Being able to change the implementation to calculate something rather than return a constant is more often a feature than a bug (especially because we are talking about defining an interface without assuming particular implementations).
• Static analysis tools (at least MyPy; maybe some linters, too?) are aware of the abc module and will catch cases where such a method has been forgotten but code attempts to construct the derived class instance anyway.
• I believe an instantiation-time check (when static checking is possible) is enough to catch violations early enough, in practice.
• No development or maintenance of a custom solution (which would have to involve some amount of metaprogramming) is needed.

I am deeply unsympathetic to the argument that we can't tolerate an extra pair of parenthesis to make this a method call.

In the code-review debate that inspired this RFC, Nate Lust made a point that I am much more sympathetic to: that the instantiation-time checks done by abc are both later than we'd like and incomplete (they just check that a derived class attribute exists, not the degree to which it is compatible with the base class).  I don't think this is enough of a problem that it merits ignoring the built-in solution in favor of providing our own, and I think I'd also vote for the undefined-but-type-annotated approach over any custom metaprogramming I can think of (but we'll see if we get anything more concrete).

In any case, I think we do need to write something down in the dev guide (and provide utility code, if that's what we decide to recommend), because this is a common problem, we've got (IMO) a bad solution out in the wild and likely to be cargo-culted, and we shouldn't have developers inventing their own solutions every time they come across it (especially if it involves new metaprogramming each time).  Since I wrote this RFC, I'm considering the abstract classmethod approach to be the proposal on the table, but I'm curious what others think.

#### Activity

Jim Bosch created issue -
Field Original Value New Value
Description It is a relatively common problem in Python to define a base-class interface that demands that all derived classes provide a class attribute with a particular name, e.g.:
{code:java}
class BaseAlgorithm(ABC):
ConfigClass = None
"""Type of the configuration object for this algorithm.

Must be set by all derived classes to a subclass of
lsst.pex.config.Config.
"""{code}
The approach above is ubiquitous in our codebase (especially in {{Task}}), but I regard it as a serious antipattern, because it is all too easy for derived classes to ignore this instruction, especially if the base class initializes the attribute to a value of the appropriate type as a sort of attempt at documentation (rather than a viable default), yielding silently incorrect behavior (I have seen this in the wild, in our codebase, with the problem going unnoticed for months).

A slightly better approach is to leave the attribute undefined in the base class, which can be done by using a type annotation instead:
{code:java}
class BaseAlgorithm(ABC):
ConfigClass: typing.Type[lsst.pex.config.Config]
"""Type of the configuration object for this algorithm.

Must be set by all derived classes to a subclass of
lsst.pex.config.Config.
"""{code}
This will cause any attempt to access the attribute to fail, as it should, and the type annotation clarifies the type on its own.  But it'd be nice if we could get a failure earlier.  We can move it up to the point that the derived class is constructed by using the {{abc}} module an a {{classmethod}}, instead of an attribute:
{code:java}
class BaseAlgorithm(ABC):
@classmethod
@abstractmethod
def getConfigClass(cls):
raise NotImplementedError(){code}
This is my preferred solution to the problem, because:
* It is *the* built-in approach to defining abstract interfaces in Python.
* It is immediately obvious to any reader familiar with the {{abc}} module (and all of our developers should be, I think) what is expected of any derived class, without any documentation.  (I am not by any means suggesting that documentation could or should be skipped in this case - but it is all too easy for the author of a class to unknowingly provide incomplete or vague documentation, especially for derived-class implementers, and in-code declaration of an abstract method is a strong guard against that.)
* The {{abc}} module provides features for edge cases like, "I want to provide a default, but want to require derived classes to opt-in" (i.e. an {{abstractmethod}} with an implementation that's called with {{super}}).
* Being able to change the implementation to calculate something rather than return a constant is more often a feature than a bug (especially because we are talking about defining an interface _without_ assuming particular implementations).
* Static analysis tools (at least MyPy; maybe some linters, too?) are aware of the {{abc}} module and will catch cases where such a method has been forgotten but code attempts to construct the derived class instance anyway.
* I believe an instantiation-time check (when static checking is possible) is enough to catch violations early enough, in practice.
* No development or maintenance of a custom solution (which would have to involve some amount of metaprogramming) is needed.

I am _deeply_ unsympathetic to the argument that we can't tolerate an extra pair of parenthesis to make this a method call.

The debate that inspired this RFC, [~nlust] made a point that I am much more sympathetic to: that the instantiation-time checks done by {{abc}} are both later than we'd like and incomplete (they just check that a derived class attribute exists, not the degree to which it is compatible with the base class).  I don't think this is enough of a problem that it merits ignoring the built-in solution in favor of providing our own, and I think I'd also vote for the undefined-but-type-annotated approach over any custom metaprogramming I can think of (but we'll see if we get anything more concrete).

In any case, I think we do need to write something down in the dev guide (and provide utility code, if that's what we decide to recommend), because this is a common problem, we've got (IMO) a bad solution out in the wild and likely to be cargo-culted, and we shouldn't have developers inventing their own solutions every time they come across it (especially if it involves _new_ metaprogramming each time).  Since I wrote this RFC, I'm considering the abstract {{classmethod}} approach to be the proposal on the table, but I'm curious what others think.
It is a relatively common problem in Python to define a base-class interface that demands that all derived classes provide a class attribute with a particular name, e.g.:
{code:java}
class BaseAlgorithm(ABC):
ConfigClass = None
"""Type of the configuration object for this algorithm.

Must be set by all derived classes to a subclass of
lsst.pex.config.Config.
"""{code}
The approach above is ubiquitous in our codebase (especially in {{Task}}), but I regard it as a serious antipattern, because it is all too easy for derived classes to ignore this instruction, especially if the base class initializes the attribute to a value of the appropriate type as a sort of attempt at documentation (rather than a viable default), yielding silently incorrect behavior (I have seen this in the wild, in our codebase, with the problem going unnoticed for months).

A slightly better approach is to leave the attribute undefined in the base class, which can be done by using a type annotation instead:
{code:java}
class BaseAlgorithm(ABC):
ConfigClass: typing.Type[lsst.pex.config.Config]
"""Type of the configuration object for this algorithm.

Must be set by all derived classes to a subclass of
lsst.pex.config.Config.
"""{code}
This will cause any attempt to access the attribute to fail, as it should, and the type annotation clarifies the type on its own.  But it'd be nice if we could get a failure earlier.  We can move it up to the point that the derived class is constructed by using the {{abc}} module an a {{classmethod}}, instead of an attribute:
{code:java}
class BaseAlgorithm(ABC):
@classmethod
@abstractmethod
def getConfigClass(cls):
"""Return ype of the configuration object for this algorithm."""
raise NotImplementedError(){code}
This is my preferred solution to the problem, because:
* It is *the* built-in approach to defining abstract interfaces in Python.
* It is immediately obvious to any reader familiar with the {{abc}} module (and all of our developers should be, I think) what is expected of any derived class, without any documentation.  (I am not by any means suggesting that documentation could or should be skipped in this case - but it is all too easy for the author of a class to unknowingly provide incomplete or vague documentation, especially for derived-class implementers, and in-code declaration of an abstract method is a strong guard against that.)
* The {{abc}} module provides features for edge cases like, "I want to provide a default, but want to require derived classes to opt-in" (i.e. an {{abstractmethod}} with an implementation that's called with {{super}}).
* Being able to change the implementation to calculate something rather than return a constant is more often a feature than a bug (especially because we are talking about defining an interface _without_ assuming particular implementations).
* Static analysis tools (at least MyPy; maybe some linters, too?) are aware of the {{abc}} module and will catch cases where such a method has been forgotten but code attempts to construct the derived class instance anyway.
* I believe an instantiation-time check (when static checking is possible) is enough to catch violations early enough, in practice.
* No development or maintenance of a custom solution (which would have to involve some amount of metaprogramming) is needed.

I am _deeply_ unsympathetic to the argument that we can't tolerate an extra pair of parenthesis to make this a method call.

In the code-review debate that inspired this RFC, [~nlust] made a point that I am much more sympathetic to: that the instantiation-time checks done by {{abc}} are both later than we'd like and incomplete (they just check that a derived class attribute exists, not the degree to which it is compatible with the base class).  I don't think this is enough of a problem that it merits ignoring the built-in solution in favor of providing our own, and I think I'd also vote for the undefined-but-type-annotated approach over any custom metaprogramming I can think of (but we'll see if we get anything more concrete).

In any case, I think we do need to write something down in the dev guide (and provide utility code, if that's what we decide to recommend), because this is a common problem, we've got (IMO) a bad solution out in the wild and likely to be cargo-culted, and we shouldn't have developers inventing their own solutions every time they come across it (especially if it involves _new_ metaprogramming each time).  Since I wrote this RFC, I'm considering the abstract {{classmethod}} approach to be the proposal on the table, but I'm curious what others think.
 Link This issue is triggering DM-33054 [ DM-33054 ]
 Status Proposed [ 10805 ] Adopted [ 10806 ]

#### People

Assignee:
Jim Bosch
Reporter:
Jim Bosch
Watchers:
Fred Moolekamp, Jim Bosch, John Parejko, Jonathan Sick, Kian-Tat Lim, Nate Lust, Tim Jenness