我有3个不同的numpy数组,但它们都以两列开头,其中包含一年中的某一天和时间。例如:
dyn = [[ 83 12 7.10555687e-01 ..., 6.99242766e-01 6.868761e-01]
[ 83 13 8.28091972e-01 ..., 8.33734118e-01 8.47266838e-01]
[ 83 14 8.79437354e-01 ..., 8.73598144e-01 8.57156213e-01]
[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]
[ 162 1 2.51653976e-01 ..., 2.17209188e-01 1.42133495e-1]]
us = [[ 133 18 3.00483815e+02 ..., 1.94277561e+00 2.8168959e+00]
[ 133 19 2.98832620e+02 ..., 2.42506475e+00 2.99730800e+00]
[ 133 20 2.96706105e+02 ..., 3.16851622e+00 4.41187088e+00]
[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]
[ 162 2 2.86992180e+02 ..., 7.08996730e-02 2.6403210e-01]]
我需要能够删除所有3个数组中不存在特定日期和时间的行。换句话说,所以我留下3个阵列,其中前3列在3个阵列中的每个阵列中是相同的。
因此产生的较小数组将是:
dyn= [[ 161 23 3.28109488e-01 ..., 2.83043689e-01 2.59775391e-01]
[ 162 0 2.23502046e-01 ..., 1.96972086e-01 1.65565263e-01]]
us= [[ 161 23 2.88336560e+02 ..., 3.44864070e-01 3.85055635e-01]
[ 162 0 2.87593240e+02 ..., 2.93002410e-01 2.67112490e-01]]
(但后来也受限于第三阵列中的内容)
我尝试过使用sort / zip但不确定它是否应该应用于2D数组:
X= dyn
Y = us
xsorted=[x for (y,x) in sorted(zip(Y[:,1],X[:,1]), key=lambda pair: pair[0])]
还有一个循环,但只有当相同的时间/天在数组中的相同位置时才有效,这是没有用的
for i in range(100):
dyn_small=dyn[dyn[:,0]==us[i,0]]
答案 0 :(得分:0)
假设A
,B
和C
作为输入数组,这是一个大量使用broadcasting
的矢量化方法 -
# Get masks comparing all rows of A with B and then B with C
M1 = (A[:,None,:2] == B[:,:2])
M2 = (B[:,None,:2] == C[:,:2])
# Get a joint 3D mask of those two masks and get the indices of matches.
# These indices (I,J,K) of the 3D mask basically tells us the row numbers
# correspondng to each of the input arrays that are present in all of them.
# Thus, in (I,J,K), I would be the matching row number in A, J in B & K in C.
I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
# Finally, select rows of A, B and C with I, J and K respectively
A_new = A[I]
B_new = B[J]
C_new = C[K]
示例运行 -
1)输入:
In [116]: A
Out[116]:
array([[ 83, 12, 443],
[ 83, 13, 565],
[ 83, 14, 342],
[161, 23, 431],
[162, 0, 113],
[162, 1, 313]])
In [117]: B
Out[117]:
array([[161, 23, 999],
[ 5, 1, 13],
[ 83, 12, 15],
[162, 0, 12],
[ 4, 3, 11]])
In [118]: C
Out[118]:
array([[ 11, 23, 143],
[162, 0, 113],
[161, 23, 545]])
2)运行解决方案代码以获取匹配的行ID,从而提取行:
In [119]: M1 = (A[:,None,:2] == B[:,:2])
...: M2 = (B[:,None,:2] == C[:,:2])
...:
In [120]: I,J,K = np.where((M1[:,:,None,:] & M2).all(3))
In [121]: A[I]
Out[121]:
array([[161, 23, 431],
[162, 0, 113]])
In [122]: B[J]
Out[122]:
array([[161, 23, 999],
[162, 0, 12]])
In [123]: C[K]
Out[123]:
array([[161, 23, 545],
[162, 0, 113]])
答案 1 :(得分:0)
numpy_indexed包(免责声明:我是它的作者)包含以优雅,高效/矢量化的方式解决此类问题的功能:
import numpy as np
import numpy_indexed as npi
dyn = np.array(dyn)
us = np.array(us)
dyn_index = npi.as_index(dyn[:, :2])
us_index = npi.as_index(us[:, :2])
common = npi.intersection(dyn_index, us_index)
print(common)
print(dyn[npi.contains(common, dyn_index)])
print(us[npi.contains(common, us_index)])
注意性能NlogN最坏的情况;并且as_index的参数已经按排序顺序排列。相比之下,目前接受的答案是输入大小的二次方。