Question

我正在处理大约100,000个值的2个数据集。这两个数据集只是列表。列表中的每个项目都是一个小类。

class Datum(object):
    def __init__(self, value, dtype, source, index1=None, index2=None):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2

对于一个列表中的每个数据，在另一个列表中有一个匹配的数据，它具有相同的dtype，source，index1和index2，我用它来对两个数据集进行排序，使它们对齐。然后，我使用匹配的数据点进行各种工作。值，总是浮动。

目前，如果我想确定一个数据集中浮点数的相对值，我会这样做。

minimum = min([x.value for x in data])
for datum in data:
    datum.value -= minimum

然而，让我的自定义类从float继承并且能够像这样行事会很好。

minimum = min(data)
data = [x - minimum for x in data]

我尝试了以下内容。

class Datum(float):                                                                                                                                                                                                                                        
    def __new__(cls, value, dtype, source, index1=None, index2=None):                                                        
        new = float.__new__(cls, value)                                                                            
        new.dtype = dtype                                                                                          
        new.source = source                                                                                        
        new.index1 = index1                                                                                                  
        new.index2 = index2
        return new

然而，做

data = [x - minimum for x in data]

删除所有额外属性（dtype，source，index1，index2）。

我应该如何设置一个类似浮点函数的类，但是保留了我用它实例化的额外数据？

更新：除了减法之外，我做了很多类型的数学运算，所以重写所有使用浮点数的方法都会非常麻烦，坦率地说我不确定我是否可以正确地重写它们。

Answer 1

我建议继承float并使用几个装饰器来捕捉＆＃34;任何方法的浮点输出（当然除了__new__）并返回Datum对象而不是float对象。

首先我们编写方法装饰器（它实际上并没有被用作下面的装饰器，它只是一个修改另一个函数输出的函数，AKA是一个包装器函数）：

def mydecorator(f,cls):
    #f is the method being modified, cls is its class (in this case, Datum)
    def func_wrapper(*args,**kwargs):
        #*args and **kwargs are all the arguments that were passed to f
        newvalue = f(*args,**kwargs)
        #newvalue now contains the output float would normally produce
        ##Now get cls instance provided as part of args (we need one
        ##if we're going to reattach instance information later):
        try:
            self = args[0]
            ##Now check to make sure new value is an instance of some numerical 
            ##type, but NOT a bool or a cls type (which might lead to recursion)
            ##Including ints so things like modulo and round will work right
            if (isinstance(newvalue,float) or isinstance(newvalue,int)) and not isinstance(newvalue,bool) and type(newvalue) != cls:
                ##If newvalue is a float or int, now we make a new cls instance using the
                ##newvalue for value and using the previous self instance information (arg[0])
                ##for the other fields
                return cls(newvalue,self.dtype,self.source,self.index1,self.index2)
        #IndexError raised if no args provided, AttributeError raised of self isn't a cls instance
        except (IndexError, AttributeError): 
            pass
        ##If newvalue isn't numerical, or we don't have a self, just return what
        ##float would normally return
        return newvalue
    #the function has now been modified and we return the modified version
    #to be used instead of the original version, f
    return func_wrapper

第一个装饰器仅适用于它所附着的方法。但我们希望它能够装饰所有（实际上，几乎所有）从float继承的方法（好吧，那些出现在浮点数__dict__中的方法，无论如何）。第二个装饰器将我们的第一个装饰器应用于float子类中的所有方法，除了列为异常（see this answer）的那些方法：

def for_all_methods_in_float(decorator,*exceptions):
    def decorate(cls):
        for attr in float.__dict__:
            if callable(getattr(float, attr)) and not attr in exceptions:
                setattr(cls, attr, decorator(getattr(float, attr),cls))
        return cls
    return decorate

现在我们编写的子类与以前一样，但是装饰，并且从装饰中排除__new__（我想我们也可以排除__init__，但__init__没有＆＃39;无论如何都要回报任何事情）：

@for_all_methods_in_float(mydecorator,'__new__')
class Datum(float):
    def __new__(klass, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        return super(Datum,klass).__new__(klass,value)
    def __init__(self, value, dtype="dtype", source="source", index1="index1", index2="index2"):
        self.value = value
        self.dtype = dtype
        self.source = source
        self.index1 = index1
        self.index2 = index2
        super(Datum,self).__init__()

以下是我们的测试程序;迭代似乎正常工作：

d1 = Datum(1.5)
d2 = Datum(3.2)
d3 = d1+d2
assert d3.source == 'source'
L=[d1,d2,d3]
d4=max(L)
assert d4.source == 'source'
L = [i for i in L]
assert L[0].source == 'source'
assert type(L[0]) == Datum
minimum = min(L)
assert [x - minimum for x in L][0].source == 'source'

注意：

我正在使用Python 3.不确定这是否会对您产生影响。
这种方法有效地覆盖了除异常之外的每种浮动方法，即使结果未被修改的方法也是如此。可能存在副作用（对内置进行子类化，然后覆盖其所有方法），例如：性能受到影响;我真的不知道。
这也将装饰嵌套类。
也可以使用元类实现相同的方法。

Answer 2

问题在于：

x - minimum

就你正在做的类型而言：

datum - float, or datum - integer

无论哪种方式，python都不知道如何做其中任何一个，所以它做的是查看参数的父类，如果可以的话。因为datum是一种float类型，它可以很容易地使用float - 并且计算最终是

float - float

这显然会导致浮动＆＃39; - 除非你告诉它，否则python无法知道如何构造你的基准对象。

要解决此问题，您需要实现数学运算符，以便python知道如何执行datum - float或提出不同的设计。

假设＆＃39; dtype＆＃39;，＆＃39; source＆＃39;，index1＆amp; index2需要在计算后保持不变 - 然后作为您的类需要的示例：

def __sub__(self, other):
      return datum(value-other, self.dtype, self.source, self.index1, self.index2)

这应该有效 - 未经过测试

现在这将允许你这样做

d = datum(23.0, dtype="float", source="me", index1=1)
e = d - 16
print e.value, e.dtype, e.source, e.index1, e.index2

应导致：

7.0 float  me  1  None

如何使用像float这样的内置函数和函数来设置一个类，但保留额外的数据？

2 个答案: