假设我有这些2D数组A和B.
如何从B中删除A中的元素。(集合论中的补充:A-B)
A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]])
B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]])
#output = [[1,1,2], [1,1,3]]
更确切地说,我想做这样的事情。
data = some numpy array
label = some numpy array
A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]]
B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]]
out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]]
答案 0 :(得分:12)
基于this solution
到Find the row indexes of several values in a numpy array
,这是一个基于NumPy的解决方案,内存占用更少,在处理大型数组时可能更有用 -
public class PendingAdapter extends RecyclerView.Adapter<PendingAdapter.PendingHolder> {
ArrayList<DownloadingActivity.scheduleListType> pendingList;
LayoutInflater inflater;
PendingAdapter(ArrayList<DownloadingActivity.scheduleListType> pendingList, Context c) {
this.pendingList = pendingList;
inflater = (LayoutInflater) c.getSystemService(Context.LAYOUT_INFLATER_SERVICE);
}
@Override
public PendingHolder onCreateViewHolder(ViewGroup parent, int viewType) {
View v = inflater.inflate(R.layout.custom_pending, parent, false);
PendingHolder pendingHolder = new PendingHolder(v);
return pendingHolder;
}
@Override
public void onBindViewHolder(PendingHolder holder, int position) {
holder.fileName.setText(pendingList.get(position).name);
if (pendingList.get(position).reqCODE != 0) {
holder.pendingType.setText("Start At " + pendingList.get(position).hour + ":" + pendingList.get(position).minute);
}
}
@Override
public int getItemCount() {
return pendingList.size();
}
//also tried this function
/** public void swap(ArrayList<DownloadingActivity.scheduleListType> newList) {
if (pendingList != null) {
pendingList.clear();
pendingList.addAll(newList);
} else {
pendingList = newList;
}
notifyDataSetChanged();
}**/
public class PendingHolder extends RecyclerView.ViewHolder implements View.OnClickListener {
TextView fileName, pendingType;
Button downloadByMobileBtn, removeFromPending;
public PendingHolder(View itemView) {
super(itemView);
pendingType = (TextView) itemView.findViewById(R.id.pendingType);
fileName = (TextView) itemView.findViewById(R.id.pendingFileNameTXT);
downloadByMobileBtn = (Button) itemView.findViewById(R.id.downloadByMobileBTN);
removeFromPending = (Button) itemView.findViewById(R.id.removeFromPending);
downloadByMobileBtn.setOnClickListener(this);
removeFromPending.setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View view) {
int position = getAdapterPosition();
if (position != RecyclerView.NO_POSITION) {
ShowRemoveDialog(position);
}
}
});
}
@Override
public void onClick(View v) {
int p = getAdapterPosition();
if(p!=RecyclerView.NO_POSITION) {
if (v.getId() == downloadByMobileBtn.getId()) {
ShowDialog(p);
}
}
}
}
示例运行 -
dims = np.maximum(B.max(0),A.max(0))+1
out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]
大型阵列上的运行时测试 -
In [38]: A
Out[38]:
array([[1, 1, 1],
[1, 1, 2],
[1, 1, 3],
[1, 1, 4]])
In [39]: B
Out[39]:
array([[0, 0, 0],
[1, 0, 2],
[1, 0, 3],
[1, 0, 4],
[1, 1, 0],
[1, 1, 1],
[1, 1, 4]])
In [40]: out
Out[40]:
array([[1, 1, 2],
[1, 1, 3]])
基于In [107]: def in1d_approach(A,B):
...: dims = np.maximum(B.max(0),A.max(0))+1
...: return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
...: np.ravel_multi_index(B.T,dims))]
...:
In [108]: # Setup arrays with B as large array and A contains some of B's rows
...: B = np.random.randint(0,9,(1000,3))
...: A = np.random.randint(0,9,(100,3))
...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
...: A[A_idx] = B[B_idx]
...:
解决方案的计时 -
broadcasting
基于内存占用较少的解决方案的时间安排 -
In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100 loops, best of 3: 4.64 ms per loop # @Kasramvd's soln
In [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100 loops, best of 3: 3.66 ms per loop
进一步提升绩效
In [111]: %timeit in1d_approach(A,B)
1000 loops, best of 3: 231 µs per loop
通过将每一行视为索引元组来减少每一行。通过引入带有in1d_approach
的矩阵乘法,我们可以更有效地做到这一点,就像这样 -
np.dot
让我们在更大的数组上对它进行测试 -
def in1d_dot_approach(A,B):
cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])
return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]
答案 1 :(得分:10)
以下是使用广播的Numpythonic方法:
In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
Out[83]:
array([[1, 1, 2],
[1, 1, 3]])
以下是其他答案的时间表:
In [90]: def cal_diff(A, B):
....: A_rows = A.view([('', A.dtype)] * A.shape[1])
....: B_rows = B.view([('', B.dtype)] * B.shape[1])
....: return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
....:
In [93]: %timeit cal_diff(A, B)
10000 loops, best of 3: 54.1 µs per loop
In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100000 loops, best of 3: 9.41 µs per loop
# Even better with Divakar's suggestion
In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100000 loops, best of 3: 7.41 µs per loop
好吧,如果你正在寻找一种更快的方法,你应该寻找减少比较次数的方法。在这种情况下(不考虑订单),您可以从行中生成一个唯一的数字,并比较可以完成的项目数量的两个数字。
以下是Divakar in1d方法的基准:
In [144]: def in1d_approach(A,B):
.....: dims = np.maximum(B.max(0),A.max(0))+1
.....: return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
.....: np.ravel_multi_index(B.T,dims))]
.....:
In [146]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 23.8 µs per loop
In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 20.2 µs per loop
您可以使用np.diff
获取与订单无关的结果:
In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])
In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
Out[195]:
array([[1, 1, 2],
[1, 1, 3]])
In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 30.7 µs per loop
Divakar的设置基准:
In [198]: B = np.random.randint(0,9,(1000,3))
In [199]: A = np.random.randint(0,9,(100,3))
In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)
In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)
In [202]: A[A_idx] = B[B_idx]
In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 137 µs per loop
In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 112 µs per loop
In [205]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 115 µs per loop
使用更大阵列的时间安排(Divakar的解决方案稍微快一点):
In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
1000 loops, best of 3: 1.01 ms per loop
In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
1000 loops, best of 3: 880 µs per loop
In [233]: %timeit in1d_approach(A, B)
1000 loops, best of 3: 807 µs per loop
答案 2 :(得分:9)
[[1, 1, 2], [1, 1, 3]]
结果
for i in B:
if i in A:
A.remove(i)
列表理解它不是从数组中删除元素,它只是重新分配,
如果要删除元素,请使用此方法
{{1}}
答案 3 :(得分:5)
如果你想以笨拙的方式去做,
import numpy as np
A = np.array([[1, 1, 1,], [1, 1, 2], [1, 1, 3], [1, 1, 4]])
B = np.array([[0, 0, 0], [1, 0, 2], [1, 0, 3], [1, 0, 4], [1, 1, 0], [1, 1, 1], [1, 1, 4]])
A_rows = A.view([('', A.dtype)] * A.shape[1])
B_rows = B.view([('', B.dtype)] * B.shape[1])
diff_array = np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
正如@Rahul建议的那样,对于一个不容易解决的问题,
diff_array = [i for i in A if i not in B]
答案 4 :(得分:4)
另一个非笨拙的解决方案:
[i for i in A if i not in B]