我正在尝试存储一个带有(可能)重叠键的字典列表,并跟踪所有这些字典中每个键的平均值。我创建了一个主要工作的类,但如果我直接修改列表中的字典,则无法更新平均值。您是否可以指向一个实现,该实现跟踪列表中的整个字典的更改以及修改列表中的字典?
我希望能够分别通过调用类的实例(如列表或字典)来访问列表中的特定字典或其中一个平均字典。下面我提供了一个(可能稍微超过)该类的最小工作示例。
import numpy as np
class ListOfDicts(object):
"""
Object to store
(i) a list of dictionaries, self.y, and
(ii) a dictionary, self.x, which maps each unique key
of the dicts in self.y to the average of the value
of that key across all dicts in self.y
"""
def __init__(self, list_):
"""
self.y - list of dictionaries
self.x - dictionary containing average of all entries across all
dicts in self.y
"""
# Allow either an dict of dicts or list of dicts
try:
self.y = [ {k:v for k, v in i.iteritems() } for i in list_ ]
except AttributeError:
self.y = [ { k:v for k, v in enumerate(i) } for i in list_ ]
self.x = self._update_x()
def __repr__(self):
cls = self.__class__.__name__
return '%r(%r)' % (cls, repr(self.y))
def __len__(self):
return len(self.y)
def __iter__(self):
return iter(self.y)
def iterkeys(self):
return iter(self.y)
def __contains__(self, key):
"""
Returns true if key is either an index of self.y or
a key of a dict within self.y.
The keys of all dicts within self.y are keys of self.x.
"""
return (key in self.y) or (key in self.x)
def __getitem__(self, key):
"""
If key is an index of self.y, get the corresponding dict.
If instead the key is a key of self.x, return the value of x[key].
"""
try:
return self.y[key]
except TypeError:
return self.x[key]
def __setitem__(self, key, valdict):
"""
Set the value of a dict of self.y to the dictionary valdict,
then update the dictionary of average values, self.x.
If key is not an index of self.y, this will throw an error.
"""
self.y[key] = valdict
self.x = self._update_x()
def __delitem__(self, key):
"""
Remove a dict from self.y, then update self.x.
This does not relabel other dictionary indices.
"""
del self.y[key]
self.x = self._update_x(self)
def _update_x(self):
"""
Calculate an average of the values of each unique key, k,
in the dictionaries within self.y:
{ k: <s[k] for s in self.y> },
where <s[k] ... > is an average over all values of k in
each dictionary in y.
"""
# Find the set of unique keys in the dictionaries of self.y
keys = reduce(lambda x, y: x | y, [ set(i.keys()) for i in self.y ])
# Calculate averages for each key and store them in a dictionary
temp = { k : np.average([ s[k] for s in self.y if s.has_key(k) ])
for k in keys }
return temp
我将y
定义为词典列表,将x
定义为平均值的定义。我已修改__getitem__
以首先查找列表y
中的索引,如果失败,请键入平均值字典x
中的键。我已修改__setitem__
以使用新词典替换y
中的指定词典,然后重新计算x
中的平均值。
我可以按照自己的意愿使用该类的一些示例:
>>> test = ListOfDicts([{'a':0.5, 'b':0.5},{'b':0.4, 'c':0.6}])
>>> test
'ListOfDicts'("[{'a': 0.5, 'b': 0.5}, {'c': 0.6, 'b': 0.4}]")
>>> test.y
[{'a': 0.5, 'b': 0.5}, {'b': 0.4, 'c': 0.6}]
>>> test.x
{'a': 0.5, 'b': 0.45000000000000001, 'c': 0.59999999999999998}
>>> test[0]
{'a': 0.5, 'b': 0.5}
>>> test[1]
{'b': 0.4, 'c': 0.6}
>>> test['a']
0.5
>>> test['b']
0.45000000000000001
>>> test[0] = {'a':0.3, 'b':0.4, 'd':0.3}
>>> test.x
{'a': 0.29999999999999999, 'b': 0.40000000000000002,
'c': 0.59999999999999998, 'd': 0.29999999999999999}
以下内容会产生不良行为:
>>> test[0]['a'] = 0.0
>>> test[0]['b'] = 0.7
>>> test.x
{'a': 0.29999999999999999, 'b': 0.40000000000000002,
'c': 0.59999999999999998, 'd': 0.29999999999999999}
我希望test.x
能够:
>>> test.x
{'a': 0.0, 'b': 0.55000000000000004, 'c': 0.59999999999999998,
'd': 0.29999999999999999}
也就是说,当我修改字典test[0]
时,平均x
字典不会更新,打印text.x
会返回非当前字段。
我有两个问题,
是否有一种解决此问题的好方法,并在y
中对字典进行任何修改会触发x
的更新?
有没有更好的方法来实现我的总体目标,即跟踪从字典列表派生的平均数量(和其他值),因为基础字典被修改了?
关于使用该课程的一个特定细节是,x
的查看次数会比y
更改,因此我不希望每次重新计算x
它叫做。
如果您想了解更多详情,或者您有任何其他问题,请告诉我。提前感谢您的帮助!