我正在寻找一种方法来获得形状为numpy.array
和(n_1, m)
的两个二维(n_2, m)
之间的交集。请注意,n_1
和n_2
可以不同,但是两个数组的m相同。这是两个具有预期结果的最小示例:
import numpy as np
array1a = np.array([[2], [2], [5], [1]])
array1b = np.array([[5], [2]])
array_intersect(array1a, array1b)
## array([[2],
## [5]])
array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])
array_intersect(array2a, array2b)
## array([[2, 1],
## [3, 3]])
如果有人对我应该如何实现array_intersect
函数有所了解,我将不胜感激!
答案 0 :(得分:1)
如何使用集合?
namespace Project
{
public sealed class PersonService
{
private readonly PersonRepository personRepository;
public PersonService(PersonRepository personRepository)
{
this.personRepository = personRepository;
}
public async Task<Person> AddAsync(Person person)
{
var existingPerson = await this.personRepository.GetByEmailAsync(person.Email);
if (existingPerson != null)
{
throw new DuplicateEmailException(person.Email, $"The Email {person.Email} is already taken.");
}
await this.personRepository.AddAsync(person);
return person;
}
}
public sealed class PersonRepository
{
private readonly ProjectDbContext dbContext;
public PersonRepository(ProjectDbContext dbContext)
{
this.dbContext = dbContext;
}
public async Task<Person> GetByEmailAsync(string email)
{
return await this.dbContext.Person
.FirstOrDefaultAsync(p => p.Email == email);
}
public async Task AddAsync(Person person)
{
if (person == null)
{
return;
}
await this.dbContext.AddAsync(person);
await this.dbContext.SaveChangesAsync();
}
}
public sealed class ProjectDbContext : DbContext
{
public ProjectDbContext(DbContextOptions options)
: base(options)
{
}
public DbSet<Person> Person { get; set; }
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
base.OnModelCreating(modelBuilder);
modelBuilder.Entity<Person>()
.Property(p => p.Email)
.IsRequired();
modelBuilder.Entity<Person>()
.HasIndex(p => new { p.Email })
.IsUnique();
}
}
}
答案 1 :(得分:0)
从第一个数组构造一组元组,并测试第二个数组的每一行。反之亦然。
def array_intersect(a, b):
s = {tuple(x) for x in a}
return np.unique([x for x in b if tuple(x) in s], axis=0)
答案 2 :(得分:0)
另一种方法是利用广播功能
import numpy as np
array2a = np.array([[1, 2], [3, 3], [2, 1], [1, 3], [2, 1]])
array2b = np.array([[2, 1], [1, 4], [3, 3]])
test = array2a[:, None] == array2b
print(array2b[np.all(test.mean(0) > 0, axis = 1)]) # [[2 1]
# [3 3]]
但这是不太易读的imo。 [edit]:或使用唯一且已设置的组合。简而言之,有很多选择!
答案 3 :(得分:0)
如果您已安装scipy
(我尚未测试速度),那么这是一种无需任何循环或列表理解的方法:
In [31]: from scipy.spatial.distance import cdist
In [32]: np.unique(array1a[np.where(cdist(array1a, array1b) == 0)[0]], axis=0)
Out[32]:
array([[2],
[5]])
In [33]: np.unique(array2a[np.where(cdist(array2a, array2b) == 0)[0]], axis=0)
Out[33]:
array([[2, 1],
[3, 3]])
答案 4 :(得分:0)
创建以numpy索引的package(免责声明:我是它的作者),其创建的确切目的是以一种富有表现力和高效的方式提供此类功能:
import numpy_indexed as npi
npi.intersect(a, b)
请注意,实现是完全向量化的;那不是遍历python中的数组。