二维numpy数组的交集

时间:2019-04-15 19:59:50

标签: python numpy

我正在寻找一种方法来获得形状为numpy.array(n_1, m)的两个二维(n_2, m)之间的交集。请注意,n_1n_2可以不同,但​​是两个数组的m相同。这是两个具有预期结果的最小示例:

import numpy as np

array1a = np.array([[2], [2], [5], [1]])
array1b = np.array([[5], [2]])

array_intersect(array1a, array1b)
##  array([[2],
##         [5]])


array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])

array_intersect(array2a, array2b)
##  array([[2, 1],
##         [3, 3]])

如果有人对我应该如何实现array_intersect函数有所了解,我将不胜感激!

5 个答案:

答案 0 :(得分:1)

如何使用集合?

namespace Project
{
    public sealed class PersonService
    {
        private readonly PersonRepository personRepository;

        public PersonService(PersonRepository personRepository)
        {
            this.personRepository = personRepository;
        }

        public async Task<Person> AddAsync(Person person)
        {
            var existingPerson = await this.personRepository.GetByEmailAsync(person.Email);

            if (existingPerson != null)
            {
                throw new DuplicateEmailException(person.Email, $"The Email {person.Email} is already taken.");
            }

            await this.personRepository.AddAsync(person);

            return person;
        }
    }

    public sealed class PersonRepository
    {
        private readonly ProjectDbContext dbContext;

        public PersonRepository(ProjectDbContext dbContext)
        {
            this.dbContext = dbContext;
        }

        public async Task<Person> GetByEmailAsync(string email)
        {
            return await this.dbContext.Person
                .FirstOrDefaultAsync(p => p.Email == email);
        }

        public async Task AddAsync(Person person)
        {
            if (person == null)
            {
                return;
            }

            await this.dbContext.AddAsync(person);
            await this.dbContext.SaveChangesAsync();
        }
    }

    public sealed class ProjectDbContext : DbContext
    {
        public ProjectDbContext(DbContextOptions options)
            : base(options)
        {
        }

        public DbSet<Person> Person { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);

            modelBuilder.Entity<Person>()
                .Property(p => p.Email)
                .IsRequired();

            modelBuilder.Entity<Person>()
                .HasIndex(p => new { p.Email })
                .IsUnique();
        }
    }
}

答案 1 :(得分:0)

从第一个数组构造一组元组,并测试第二个数组的每一行。反之亦然。

def array_intersect(a, b):
    s = {tuple(x) for x in a}
    return np.unique([x for x in b if tuple(x) in s], axis=0)

答案 2 :(得分:0)

另一种方法是利用广播功能

import numpy as np

array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])

test = array2a[:, None] == array2b
print(array2b[np.all(test.mean(0) > 0, axis = 1)]) # [[2 1]
                                                   # [3 3]]

但这是不太易读的imo。 [edit]:或使用唯一且已设置的组合。简而言之,有很多选择!

答案 3 :(得分:0)

如果您已安装scipy(我尚未测试速度),那么这是一种无需任何循环或列表理解的方法:

In [31]: from scipy.spatial.distance import cdist

In [32]: np.unique(array1a[np.where(cdist(array1a, array1b) == 0)[0]], axis=0)
Out[32]: 
array([[2],
       [5]])

In [33]: np.unique(array2a[np.where(cdist(array2a, array2b) == 0)[0]], axis=0)
Out[33]: 
array([[2, 1],
       [3, 3]])

答案 4 :(得分:0)

创建以numpy索引的package(免责声明:我是它的作者),其创建的确切目的是以一种富有表现力和高效的方式提供此类功能:

import numpy_indexed as npi
npi.intersect(a, b)

请注意,实现是完全向量化的;那不是遍历python中的数组。