numpy根据数组1和2的正确位置在第3个数组中查找值

时间:2016-08-17 14:40:35

标签: python-2.7 pandas numpy

我猜有一种快速的方法可以做到这一点。我有3个相同大小的数组,代表x,y,z的坐标,如:

In[85]: xxn
Out[85]: 
array([ 0.08333333,  0.08333333,  0.08333333,  0.08333333,  0.08333333,
        0.08333333,  0.08333333,  0.08333333,  0.08333333,  0.25      ,
        0.25      ,  0.25      ,  0.25      ,  0.25      ,  0.25      ,
        0.25      ,  0.25      ,  0.25      ,  0.5       ,  0.5       ,
        0.5       ,  0.5       ,  0.5       ,  0.5       ,  0.5       ,
        0.5       ,  0.5       ,  1.        ,  1.        ,  1.        ,
        1.        ,  1.        ,  1.        ,  1.        ,  1.        ,
        1.        ,  2.        ,  2.        ,  2.        ,  2.        ,
        2.        ,  2.        ,  2.        ,  2.        ,  2.        ,
        3.        ,  3.        ,  3.        ,  3.        ,  3.        ,
        3.        ,  3.        ,  3.        ,  3.        ,  4.        ,
        4.        ,  4.        ,  4.        ,  4.        ,  4.        ,
        4.        ,  4.        ,  4.        ,  5.        ,  5.        ,
        5.        ,  5.        ,  5.        ,  5.        ,  5.        ,
        5.        ,  5.        ])
yyn
Out[86]: 
array([ 1306.89 ,  1524.705,  1742.52 ,  1960.335,  2178.15 ,  2395.965,
        2613.78 ,  2831.595,  3049.41 ,  1306.89 ,  1524.705,  1742.52 ,
        1960.335,  2178.15 ,  2395.965,  2613.78 ,  2831.595,  3049.41 ,
        1306.89 ,  1524.705,  1742.52 ,  1960.335,  2178.15 ,  2395.965,
        2613.78 ,  2831.595,  3049.41 ,  1306.89 ,  1524.705,  1742.52 ,
        1960.335,  2178.15 ,  2395.965,  2613.78 ,  2831.595,  3049.41 ,
        1306.89 ,  1524.705,  1742.52 ,  1960.335,  2178.15 ,  2395.965,
        2613.78 ,  2831.595,  3049.41 ,  1306.89 ,  1524.705,  1742.52 ,
        1960.335,  2178.15 ,  2395.965,  2613.78 ,  2831.595,  3049.41 ,
        1306.89 ,  1524.705,  1742.52 ,  1960.335,  2178.15 ,  2395.965,
        2613.78 ,  2831.595,  3049.41 ,  1306.89 ,  1524.705,  1742.52 ,
        1960.335,  2178.15 ,  2395.965,  2613.78 ,  2831.595,  3049.41 ])

    In[87]: zzn
Out[87]: 
array([ 0.4837052 ,  0.3976288 ,  0.3076519 ,  0.2105963 ,  0.1015546 ,
        0.1162558 ,  0.1723646 ,  0.2173536 ,  0.2547635 ,  0.3767569 ,
        0.3196527 ,  0.2606447 ,  0.1983554 ,  0.1291423 ,  0.09786849,
        0.1277448 ,  0.1560009 ,  0.1802875 ,  0.3420683 ,  0.2938885 ,
        0.2452067 ,  0.1958042 ,  0.144459  ,  0.1026045 ,  0.1086459 ,
        0.1256328 ,  0.1419562 ,  0.3090272 ,  0.2726449 ,  0.236535  ,
        0.200679  ,  0.1647521 ,  0.1310315 ,  0.1132389 ,  0.1129602 ,
        0.118809  ,  0.284265  ,  0.257173  ,  0.2310047 ,  0.205817  ,
        0.18154   ,  0.1586908 ,  0.1393701 ,  0.1264879 ,  0.1204383 ,
        0.2760804 ,  0.2540095 ,  0.2330927 ,  0.2133592 ,  0.1947658 ,
        0.1775263 ,  0.1622754 ,  0.1498286 ,  0.1407699 ,  0.274541  ,
        0.2560495 ,  0.2387175 ,  0.222547  ,  0.2075007 ,  0.1936717 ,
        0.1812974 ,  0.1706293 ,  0.1618527 ,  0.2802191 ,  0.2641784 ,
        0.2491889 ,  0.2352521 ,  0.2223443 ,  0.2105051 ,  0.199825  ,
        0.1903785 ,  0.1822064 ])

我想找出最快的方法来获得基于xxn和yyn中匹配位置的zzn值,例如[1,23995.965]将返回0.1310315,这是数组zzn中[1,23995.965]的位置匹配位置

在pandas我会做zz [(xx == 1)& (yy == 2395.965)] = 0.1310315但不幸的是它有一个巨大的循环而且它的速度变慢了。

感谢任何帮助,谢谢!

编辑:

我当前的循环是使用pandas,如

for coordinate in df.itertuples():
    sTL = zz[(xx == x_match) & (yy == y_match)].values
    sBL = zz[(xx == x_match) & (yy == sB)].values
    sTR = zz[(xx == sR) & (yy == y_match)].values
    sBR = zz[(xx == sR) & (yy == sB)].values

其中坐标是x_match,y_match,sR,sB值并且有100k行

3 个答案:

答案 0 :(得分:1)

您可以将xxnyyn堆叠到一个数组中,搜索此新数组并使用结果从zzn获取值:

a = numpy.vstack((xxn, yyn)).T

idx = numpy.all(a==numpy.array([1.0, 2395.965]), axis=1)
print zzn[idx]

答案 1 :(得分:0)

经过调查,我想出了一个简单的方法:

np.where((xxn == x_match) & (yyn ==y_match), zzn, 0).sum()

这看起来比熊猫等同得快:

 %timeit np.where((xxn == x_match) & (yyn ==y_match), zzn, 0).sum()
The slowest run took 8.72 times longer than the fastest. This could mean 

that an intermediate result is being cached.
100000 loops, best of 3: 8.19 �s per loop

 %timeit zz[(xx == x_match) & (yy == y_match)].values
1000 loops, best of 3: 1.43 ms per loop

答案 2 :(得分:0)

以下是我在Pandas中的表现:

xyz = pd.DataFrame({'x':xxn, 'y':yyn, 'z':zzn})
xyz.set_index(['x', 'y'], inplace=True)

hunt = pd.DataFrame({'x':df[:,0], 'y':df[:,1]}) # coords to look for
print hunt.join(xyz, ['x', 'y'])