我需要numpy union和intersection操作的交叉映射指示。我下面的代码工作正常,但我想在将它应用于大型数据集之前进行矢量化。或者,如果有更好的,内置的,那么它是什么?
# ------- define the arrays and set operations ---------
A = np.array(['a','b','c','e','f','g','h','j'])
B = np.array(['h','i','j','k','m'])
C = np.union1d(A, B)
D = np.intersect1d(A,B)
# ------- get the mapped indicies for the union ----
zc = np.empty((len(C),3,))
zc[:]=np.nan
zc[:,0] = range(0,len(C))
for iy in range(0,len(C)):
for ix in range(0, len(A)):
if A[ix] == C[iy]:
zc[iy,1] = ix
for ix in range(0, len(B)):
if B[ix] == C[iy]:
zc[iy,2] = ix
# ------- get the mapped indicies for the intersection ----
zd = np.empty((len(D),3,))
zd[:]=np.nan
zd[:,0] = range(0,len(D))
for iy in range(0,len(D)):
for ix in range(0, len(A)):
if A[ix] == D[iy]:
zd[iy,1] = ix
for ix in range(0, len(B)):
if B[ix] == D[iy]:
zd[iy,2] = ix
答案 0 :(得分:2)
对于这样的情况,您可能希望将字符串转换为数字,因为使用它们会更有效率。此外,鉴于输出是数字数组,将它们作为数字ID预先更有意义。现在,为了转换为数字ID,我看到人们使用lambda
等方法,但我会使用np.unique
,这对于像这样的情况非常有效。这是从数字ID转换开始的实现 -
# ------------------------ Setup work -------------------------------
_,idx1 = np.unique(np.append(A,B),return_inverse=True)
A_ID = idx1[:A.size]
B_ID = idx1[A.size:]
# ------------------------ Union work -------------------------------
# Get length of zc, which would be the max of ID+1.
lenC = idx1.max()+1
# Initialize output array zc and fill with NaNs.
zc1 = np.empty((lenC,3,))
zc1[:]=np.nan
# Fill first column with consecutive numbers starting with 0
zc1[:,0] = range(0,lenC)
# Most important part of the code :
# Set the cols-1,2 at places specified by IDs from A and B respectively
# with values from 0 to the extent of the respective IDs
zc1[A_ID,1] = np.arange(A_ID.size)
zc1[B_ID,2] = np.arange(B_ID.size)
# ------------------------ Intersection work -------------------------------
# Get intersecting indices between A and B
intersect_ID = np.argwhere(A_ID[:,None] == B_ID)
# Initialize output zd based on the number of interesects
lenD = intersect_ID.shape[0]
zd1 = np.empty((lenD,3,))
zd1[:] = np.nan
# Fill first column with consecutive numbers starting with 0
zd1[:,0] = range(0,lenD)
zd1[:,1:] = intersect_ID