Question

我有一个类似于此处讨论的问题 Concatenating dictionaries of numpy arrays (avoiding manual loops if possible)

我正在寻找一种方法来连接两个包含任意大小的numpy数组的python词典中的值，同时避免手动循环字典键。例如：

import numpy as np

# Create first dictionary
n1 = 3
s = np.random.randint(1,101,n1)
n2 = 2
r = np.random.rand(n2)
d = {"r":r,"s":s}
print "d = ",d

# Create second dictionary
n3 = 1
s = np.random.randint(1,101,n3)
n4 = 3
r = np.random.rand(n4)
d2 = {"r":r,"s":s}
print "d2 = ",d2

# Some operation to combine the two dictionaries...
d = SomeOperation(d,d2)

# Updated dictionary
print "d3 = ",d

给出输出

>> d =  {'s': array([75, 25, 88]), 'r': array([ 0.1021227 ,  0.99454874])}
>> d2 =  {'s': array([78]), 'r': array([ 0.27610587,  0.57037473, 0.59876391])}
>> d3 =  {'s': array([75, 25, 88, 78]), 'r': array([ 0.1021227 ,  0.99454874, 0.27610587,  0.57037473, 0.59876391])}

即。因此，如果密钥已经存在，则存储在该密钥下的numpy数组。

前面讨论中使用包pandas提出的解决方案不起作用，因为它需要具有相同长度的数组（n1 = n2和n3 = n4）。

有没有人知道最好的方法，同时尽量减少使用缓慢的手动for循环？（我想避免循环，因为我想要组合的词典可能有数百个键。）

谢谢（也是为了“瞄准”制定一个非常明确的问题）！

Answer 1

一种方法是使用系列字典（即值是系列而不是数组）：

In [11]: d2
Out[11]: {'r': array([ 0.3536318 ,  0.29363604,  0.91307454]), 's': array([46])}

In [12]: d2 = {name: pd.Series(arr) for name, arr in d2.iteritems()}

In [13]: d2
Out[13]:
{'r': 0    0.353632
1    0.293636
2    0.913075
dtype: float64,
 's': 0    46
dtype: int64}

这样你可以将它传递给DataFrame构造函数：

In [14]: pd.DataFrame(d2)
Out[14]:
          r   s
0  0.353632  46
1  0.293636 NaN
2  0.913075 NaN

连接不同长度的numpy数组的字典（如果可能，避免手动循环）

1 个答案: