Question

在我正在开发的程序的一部分中，我希望使用与数据集X的某些功能相关的术语执行线性回归。使用的确切模型可由用户配置，尤其是要使用的术语（或术语集）。这涉及生成矩阵X'，其中X'的每一行都是X的相应行的函数。 X'的列将成为我回归的预测因子。

例如，假设我的数据集是二维的（X有2列）。如果我们将x和x'表示为X和X'的对应行，那么假设x是二维x'可能类似于

[ 1, x[0], x[1], x[0] * x[1], sqrt(x[0]), sqrt(x[1]), x[0]**2, x[1]**2 ]

您可以看到这些字词成群结队。首先是1（常数），然后是未转换的数据（线性），然后是两个数据元素的乘积（如果x具有两个以上的维度，将是所有成对产品），然后是平方根和正方形个别条款。

我需要在python中以某种方式定义所有这些术语集，这样每个术语都有一个用户可读的名称，用于生成术语的函数，用于从输入的维度获取术语数的函数，用于生成标签的函数对于基于数据的列标签的术语，等等。从概念上讲，这些都感觉它们应该是TermSet类或类似的实例，但这并不是很有效，因为它们的方法需要不同。我的第一个想法是用这样的东西：

termsets = {} # Keep track of sets

class SqrtTerms:
    display = 'Square Roots' # user-readable name

    @staticmethod
    def size(d):
        """Number of terms based on input columns"""
        return d

    @staticmethod
    def make(X):
        """Make the terms from the input data"""
        return numpy.sqrt(X)

    @staticmethod
    def labels(columns):
        """List of term labels based off of data column labels"""
        return ['sqrt(%s)' % c for c in columns]

termsets['sqrt'] = SqrtTerms # register class in dict


class PairwiseProductTerms:
    display = 'Pairwise Products'

    @staticmethod
    def size(d):
        return (d * (d-1)) / 2

    @staticmethod
    def make(X):
        # Some more complicated code that spans multiple lines
        ...

    @staticmethod
    def labels(columns):
        # Technically a one-liner but also more complicated
        return ['(%s) * (%s)' % (columns[c1], columns[c2])
            for c1 in range(len(columns)) for c2 in range(len(columns))
            if c2 > c1]

termsets['pairprod'] = PairwiseProductTerms

这有效：我可以从字典中检索类，将我想要使用的类放在列表中，并在每个类上调用相应的方法。尽管如此，仅使用静态属性和方法创建类似乎是丑陋和单一的。我提出的另一个想法是创建一个类装饰器，可以像：

一样使用

# Convert bound methods to static ones, assign "display" static
# attribute and add to dict with key "name"
@regression_terms(name='sqrt', display='Square Roots')
class SqrtTerms:
    def size(d):
        return d
    def make(X):
        return numpy.sqrt(X)
    def labels(columns):
        return ['sqrt(%s)' % c for c in columns]

这给出了相同的结果，但更清晰，更好（自己）读写（特别是如果我需要很多这些）。然而，实际上工作的方式是模糊不清的，其他任何读这篇文章的人都可能有一个难以理解的问题。我还想过为这些创建一个元类，但这听起来有些过分。我应该在这里使用更好的模式吗？

Answer 1

有些人会一直说这是滥用语言。我说Python的设计是可以滥用的，并且创建不需要解析器但看起来不像lisp的DSL的能力是它的核心优势之一。

如果你真的有很多这些，请使用元类。如果您这样做，除了拥有术语词典之外，您还可以拥有引用术语的属性。这真的很好，因为你可以拥有这样的代码：

print Terms.termsets
print Terms.sqrt
print Terms.pairprod
print Terms.pairprod.size(5)

返回如下结果：

{'pairprod': <class '__main__.PairwiseProductTerms'>,
 'sqrt': <class '__main__.SqrtTerms'>}
<class '__main__.SqrtTerms'>
<class '__main__.PairwiseProductTerms'>
10

可以执行此操作的完整代码在此处：

from types import FunctionType

class MetaTerms(type):
    """
    This metaclass will let us create a Terms class.
    Every subclass of the terms class will have its
    methods auto-wrapped as static methods, and
    will be added to the terms directory.
    """
    def __new__(cls, name, bases, attr):
        # Auto-wrap all methods as static methods
        for key, value in attr.items():
            if isinstance(value, FunctionType):
                attr[key] = staticmethod(value)
        # call types.__new__ to finish the job
        return super(MetaTerms, cls).__new__(cls, name, bases, attr)

    def __init__(cls, name, bases, attr):
        # At __init__ time, the class has already been
        # built, so any changes to the bases or attr
        # will not be reflected in the cls.
        # Call types.__init__ to finish the job
        super(MetaTerms, cls).__init__(name, bases, attr)
        # Add the class into the termsets.
        if name != 'Terms':
            cls.termsets[cls.shortname] = cls

    def __getattr__(cls, name):
        return cls.termsets[name]

class Terms(object):
    __metaclass__ = MetaTerms
    termsets = {} # Keep track of sets


class SqrtTerms(Terms):
    display = 'Square Roots' # user-readable name
    shortname = 'sqrt'  # Used to find in Terms.termsets

    def size(d):
        """Number of terms based on input columns"""
        return d

    def make(X):
        """Make the terms from the input data"""
        return numpy.sqrt(X)

    def labels(columns):
        """List of term labels based off of data column labels"""
        return ['sqrt(%s)' % c for c in columns]


class PairwiseProductTerms(Terms):
    display = 'Pairwise Products'
    shortname = 'pairprod'

    def size(d):
        return (d * (d-1)) / 2

    def make(X):
        pass

    def labels(columns):
        # Technically a one-liner but also more complicated
        return ['(%s) * (%s)' % (columns[c1], columns[c2])
            for c1 in range(len(columns)) for c2 in range(len(columns))
            if c2 > c1]

print Terms.termsets
print Terms.sqrt
print Terms.pairprod
print Terms.pairprod.size(5)

如果你在一个单独的模块中隐藏了元类和基本的术语类，那么没有人必须看它 - 只是from baseterm import Terms。您还可以执行一些很酷的自动发现/自动导入，其中正确目录中的转储模块会自动将它们添加到您的DSL中。

使用元类，当您找到其他您希望使用迷你语言的内容时，功能集可以轻松有机地增长。

Python中的纯静态类 - 使用元类，类装饰器或其他东西？

1 个答案: