从数组中删除另一个数组中的元素

时间:2016-10-15 06:36:17

标签: python arrays numpy

假设我有这些2D数组A和B.

如何从B中删除A中的元素。(集合论中的补充:A-B)

A=np.asarray([[1,1,1], [1,1,2], [1,1,3], [1,1,4]])
B=np.asarray([[0,0,0], [1,0,2], [1,0,3], [1,0,4], [1,1,0], [1,1,1], [1,1,4]])
#output = [[1,1,2], [1,1,3]]

更确切地说,我想做这样的事情。

data = some numpy array
label = some numpy array
A = np.argwhere(label==0) #[[1 1 1], [1 1 2], [1 1 3], [1 1 4]]
B = np.argwhere(data>1.5) #[[0 0 0], [1 0 2], [1 0 3], [1 0 4], [1 1 0], [1 1 1], [1 1 4]]
out = np.argwhere(label==0 and data>1.5) #[[1 1 2], [1 1 3]]

5 个答案:

答案 0 :(得分:12)

基于this solutionFind the row indexes of several values in a numpy array,这是一个基于NumPy的解决方案,内存占用更少,在处理大型数组时可能更有用 -

public class PendingAdapter extends RecyclerView.Adapter<PendingAdapter.PendingHolder> {

    ArrayList<DownloadingActivity.scheduleListType> pendingList;
    LayoutInflater inflater;

    PendingAdapter(ArrayList<DownloadingActivity.scheduleListType> pendingList, Context c) {

        this.pendingList = pendingList;
        inflater = (LayoutInflater) c.getSystemService(Context.LAYOUT_INFLATER_SERVICE);
    }


    @Override
    public PendingHolder onCreateViewHolder(ViewGroup parent, int viewType) {
        View v = inflater.inflate(R.layout.custom_pending, parent, false);
        PendingHolder pendingHolder = new PendingHolder(v);
        return pendingHolder;
    }

    @Override
    public void onBindViewHolder(PendingHolder holder, int position) {

        holder.fileName.setText(pendingList.get(position).name);
        if (pendingList.get(position).reqCODE != 0) {
            holder.pendingType.setText("Start At " + pendingList.get(position).hour + ":" + pendingList.get(position).minute);
        }

    }

    @Override
    public int getItemCount() {
        return pendingList.size();
    }


     //also tried this function
   /** public void swap(ArrayList<DownloadingActivity.scheduleListType> newList) {
        if (pendingList != null) {
            pendingList.clear();
            pendingList.addAll(newList);
        } else {
            pendingList = newList;
        }
        notifyDataSetChanged();
    }**/

    public class PendingHolder extends RecyclerView.ViewHolder implements View.OnClickListener {

        TextView fileName, pendingType;
        Button downloadByMobileBtn, removeFromPending;

        public PendingHolder(View itemView) {
            super(itemView);
            pendingType = (TextView) itemView.findViewById(R.id.pendingType);

            fileName = (TextView) itemView.findViewById(R.id.pendingFileNameTXT);
            downloadByMobileBtn = (Button) itemView.findViewById(R.id.downloadByMobileBTN);
            removeFromPending = (Button) itemView.findViewById(R.id.removeFromPending);

            downloadByMobileBtn.setOnClickListener(this);
            removeFromPending.setOnClickListener(new View.OnClickListener() {
                @Override
                public void onClick(View view) {
                    int position = getAdapterPosition();
                    if (position != RecyclerView.NO_POSITION) {
                        ShowRemoveDialog(position);
                    }
                }
            });


        }

        @Override
        public void onClick(View v) {
            int p = getAdapterPosition();

            if(p!=RecyclerView.NO_POSITION) {
                if (v.getId() == downloadByMobileBtn.getId()) {


                    ShowDialog(p);
                }
            }
        }


    }

示例运行 -

dims = np.maximum(B.max(0),A.max(0))+1
out = A[~np.in1d(np.ravel_multi_index(A.T,dims),np.ravel_multi_index(B.T,dims))]

大型阵列上的运行时测试 -

In [38]: A
Out[38]: 
array([[1, 1, 1],
       [1, 1, 2],
       [1, 1, 3],
       [1, 1, 4]])

In [39]: B
Out[39]: 
array([[0, 0, 0],
       [1, 0, 2],
       [1, 0, 3],
       [1, 0, 4],
       [1, 1, 0],
       [1, 1, 1],
       [1, 1, 4]])

In [40]: out
Out[40]: 
array([[1, 1, 2],
       [1, 1, 3]])

基于In [107]: def in1d_approach(A,B): ...: dims = np.maximum(B.max(0),A.max(0))+1 ...: return A[~np.in1d(np.ravel_multi_index(A.T,dims),\ ...: np.ravel_multi_index(B.T,dims))] ...: In [108]: # Setup arrays with B as large array and A contains some of B's rows ...: B = np.random.randint(0,9,(1000,3)) ...: A = np.random.randint(0,9,(100,3)) ...: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0) ...: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0) ...: A[A_idx] = B[B_idx] ...: 解决方案的计时 -

broadcasting

基于内存占用较少的解决方案的时间安排 -

In [109]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100 loops, best of 3: 4.64 ms per loop # @Kasramvd's soln

In [110]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100 loops, best of 3: 3.66 ms per loop

进一步提升绩效

In [111]: %timeit in1d_approach(A,B) 1000 loops, best of 3: 231 µs per loop 通过将每一行视为索引元组来减少每一行。通过引入带有in1d_approach的矩阵乘法,我们可以更有效地做到这一点,就像这样 -

np.dot

让我们在更大的数组上对它进行测试 -

def in1d_dot_approach(A,B):
    cumdims = (np.maximum(A.max(),B.max())+1)**np.arange(B.shape[1])
    return A[~np.in1d(A.dot(cumdims),B.dot(cumdims))]

答案 1 :(得分:10)

以下是使用广播的Numpythonic方法:

In [83]: A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
Out[83]: 
array([[1, 1, 2],
       [1, 1, 3]])

以下是其他答案的时间表:

In [90]: def cal_diff(A, B):
   ....:     A_rows = A.view([('', A.dtype)] * A.shape[1])
   ....:     B_rows = B.view([('', B.dtype)] * B.shape[1])
   ....:     return np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])
   ....: 

In [93]: %timeit cal_diff(A, B)
10000 loops, best of 3: 54.1 µs per loop

In [94]: %timeit A[np.all(np.any((A-B[:, None]), axis=2), axis=0)]
100000 loops, best of 3: 9.41 µs per loop

# Even better with Divakar's suggestion
In [97]: %timeit A[~((A[:,None,:] == B).all(-1)).any(1)]
100000 loops, best of 3: 7.41 µs per loop

好吧,如果你正在寻找一种更快的方法,你应该寻找减少比较次数的方法。在这种情况下(不考虑订单),您可以从行中生成一个唯一的数字,并比较可以完成的项目数量的两个数字。

以下是Divakar in1d方法的基准:

In [144]: def in1d_approach(A,B):
   .....:         dims = np.maximum(B.max(0),A.max(0))+1
   .....:         return A[~np.in1d(np.ravel_multi_index(A.T,dims),\
   .....:                          np.ravel_multi_index(B.T,dims))]
   .....: 

In [146]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 23.8 µs per loop

In [145]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 20.2 µs per loop

您可以使用np.diff获取与订单无关的结果:

In [194]: B=np.array([[0, 0, 0,], [1, 0, 2,], [1, 0, 3,], [1, 0, 4,], [1, 1, 0,], [1, 1, 1,], [1, 1, 4,], [4, 1, 1]])

In [195]: A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
Out[195]: 
array([[1, 1, 2],
       [1, 1, 3]])

In [196]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 30.7 µs per loop

Divakar的设置基准:

In [198]: B = np.random.randint(0,9,(1000,3))

In [199]: A = np.random.randint(0,9,(100,3))

In [200]: A_idx = np.random.choice(np.arange(A.shape[0]),size=10,replace=0)

In [201]: B_idx = np.random.choice(np.arange(B.shape[0]),size=10,replace=0)

In [202]: A[A_idx] = B[B_idx]

In [203]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
10000 loops, best of 3: 137 µs per loop

In [204]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
10000 loops, best of 3: 112 µs per loop

In [205]: %timeit in1d_approach(A, B)
10000 loops, best of 3: 115 µs per loop

使用更大阵列的时间安排(Divakar的解决方案稍微快一点):

In [231]: %timeit A[~np.in1d(np.diff(np.diff(np.power(A, 2))), np.diff(np.diff(np.power(B, 2))))]
1000 loops, best of 3: 1.01 ms per loop

In [232]: %timeit A[~np.in1d(np.power(A, 2).sum(1), np.power(B, 2).sum(1))]
1000 loops, best of 3: 880 µs per loop

In [233]:  %timeit in1d_approach(A, B)
1000 loops, best of 3: 807 µs per loop

答案 2 :(得分:9)

list comprehension

有一个简单的解决方案
[[1, 1, 2], [1, 1, 3]]

结果

for i in B:
     if i in A:
     A.remove(i)

列表理解它不是从数组中删除元素,它只是重新分配,

如果要删除元素,请使用此方法

{{1}}

答案 3 :(得分:5)

如果你想以笨拙的方式去做,

import numpy as np

A = np.array([[1, 1, 1,], [1, 1, 2], [1, 1, 3], [1, 1, 4]])
B = np.array([[0, 0, 0], [1, 0, 2], [1, 0, 3], [1, 0, 4], [1, 1, 0], [1, 1, 1], [1, 1, 4]])
A_rows = A.view([('', A.dtype)] * A.shape[1])
B_rows = B.view([('', B.dtype)] * B.shape[1])

diff_array = np.setdiff1d(A_rows, B_rows).view(A.dtype).reshape(-1, A.shape[1])

正如@Rahul建议的那样,对于一个不容易解决的问题,

diff_array = [i for i in A if i not in B]

答案 4 :(得分:4)

另一个非笨拙的解决方案:

[i for i in A if i not in B]