将每个键的值存储为字典中的数组

时间:2019-03-21 11:05:05

标签: python arrays python-3.x dictionary for-loop

我想规范字典data中的所有值,然后将它们再次存储在具有相同键的另一个字典中,并且对于每个键,值都应存储在1D数组中,所以我执行了以下操作:

>>> data = {1: [0.6065306597126334], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()} 

>>> norm
{1: [1], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

现在假设字典data的其中一个键仅包含零值,例如第一个键1的值:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}

然后归一化此字典的值将是[nan]的值,因为除以零

>>> norm = {k: [v / sum(vals) for v in vals] for k, vals in data.items()}

__main__:1: RuntimeWarning: invalid value encountered in double_scalars
>>> norm
{1: [nan], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

所以我插入了if statement来解决此问题,但是我不能将每个键的值存储为ID数组

代码

>>> norm = {}
>>> for k, vals in data.items():
...     values = []
...     if sum(vals) == 0:
...        values.append(list(vals))
...     else:
...          for v in vals:
...              values.append(list([v/sum(vals)]))
...     norm[k]=values
... 
>>> norm
{1: [[1.0]], 2: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 3: [[0.4498162176582741], [0.4498162176582741], [0.10036756468345168]], 4: [[0.5], [0.5]]}

我想将norm词典作为

norm = {1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

此外,对于字典data,如果它是键,则它包含一个零值,但是有更好的解决方案将其标准化,因为我认为我的解决方案效率不高!

P.S:我在for循环norm[k]= np.array(values)而不是norm[k]=values的末尾尝试过,但结果不符合要求。

3 个答案:

答案 0 :(得分:1)

如上所述,

append将一个元素添加到列表中,并且该元素可以是列表,这就是为什么当前在列表中有一个列表的原因。理想情况下,您应该使用extend将第一个列表与另一个列表连接起来。

答案 1 :(得分:1)

如答案中所述,extend可用于解决您的问题。如果您确实想使用append,则可以使用列表的第一个元素。

norm = {}
for k, vals in data.items():
    values = []
    if sum(vals) == 0:
        values.append(vals[0])
    else:
        for v in vals:
            values.append([v / sum(vals)][0])
    norm[k] = values

有关添加与扩展的示例,请参见difference between append vs extend list methods in python

关于优化。无法完全删除for循环,但是您可以在保持可读性的同时简化解决方案:

norm = {}
for k, vals in data.items():
    if sum(vals) == 0:
        norm[k] = vals
    else:
        norm[k] = [x / sum(vals) for x in vals]

答案 2 :(得分:0)

sum(vals) == 0时,您的字典/列表理解失败:

>>> data = {1: [0.0], 2: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 3: [0.6065306597126334, 0.6065306597126334, 0.1353352832366127], 4: [0.6065306597126334, 0.6065306597126334]}
>>> {k: [v / sum(vals) for v in vals] for k, vals in data.items()}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <dictcomp>
  File "<stdin>", line 1, in <listcomp>
ZeroDivisionError: float division by zero

您可以引入三元表达式来处理这种情况:

>>> {k: [v / sum(vals) if sum(vals)!=0 else 1.0 for v in vals] for k, vals in data.items()}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

如果您想避免对sum(vals)进行多次评估:

>>> {k: [v / s if s!=0 else 1.0 for v in vals] for k,vals,s in ((k, vals, sum(vals)) for k, vals in data.items())}
{1: [1.0], 2: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 3: [0.4498162176582741, 0.4498162176582741, 0.10036756468345168], 4: [0.5, 0.5]}

((k, vals, sum(vals)) for k, vals in data.items())是一个生成器,为每个项目返回kvalssum(vals)