Question

我正在寻找一种方法来连接两个包含numpy数组的python词典中的值，同时避免手动循环字典键。例如：

import numpy as np

# Create first dictionary
n = 5
s = np.random.randint(1,101,n)
r = np.random.rand(n)
d = {"r":r,"s":s}
print "d = ",d

# Create second dictionary
n = 2
s = np.random.randint(1,101,n)
r = np.random.rand(n)
t = np.array(["a","b"])
d2 = {"r":r,"s":s,"t":t}
print "d2 = ",d2

# Some operation to combine the two dictionaries...
d = SomeOperation(d,d2)

# Updated dictionary
print "d3 = ",d

给出输出

>> d =  {'s': array([75, 25, 88, 54, 82]), 'r': array([ 0.1021227 ,  0.99454874, 0.38680718,  0.98720877,  0.8662894 ])}
>> d2 =  {'s': array([78, 92]), 'r': array([ 0.27610587,  0.57037473]), 't': array(['a', 'b'], dtype='|S1')}
>> d3 =  {'s': array([75, 25, 88, 54, 82, 78, 92]), 'r': array([ 0.1021227 ,  0.99454874, 0.38680718,  0.98720877,  0.8662894, 0.27610587,  0.57037473]), 't': array(['a', 'b'], dtype='|S1')}

即。因此，如果密钥已存在，则存储在该密钥下的numpy数组将附加到。

有没有人知道最好的方法，同时尽量减少使用缓慢的手动for循环？（我想避免循环，因为我想要组合的词典可能有数百个键。）

谢谢！

Answer 1

您可以使用pandas：

from __future__ import print_function, division
import pandas as pd
import numpy as np

# Create first dictionary
n = 5
s = np.random.randint(1,101,n)
r = np.random.rand(n)
d = {"r":r,"s":s}
df = pd.DataFrame(d)
print(df)

# Create second dictionary
n = 2
s = np.random.randint(1,101,n)
r = np.random.rand(n)
t = np.array(["a","b"])
d2 = {"r":r,"s":s,"t":t}
df2 = pd.DataFrame(d2)
print(df2)

print(pd.concat([df, df2]))

输出：

          r   s
0  0.551402  49
1  0.620870  34
2  0.535525  52
3  0.920922  13
4  0.708109  48
          r   s  t
0  0.231480  43  a
1  0.492576  10  b
          r   s    t
0  0.551402  49  NaN
1  0.620870  34  NaN
2  0.535525  52  NaN
3  0.920922  13  NaN
4  0.708109  48  NaN
0  0.231480  43    a
1  0.492576  10    b

连接numpy数组的字典（如果可能，避免手动循环）

1 个答案: