根据给定条件对组合数据对进行分组

时间:2020-10-19 10:23:38

标签: python arrays numpy grouping combinations

假设我有大量的数据,其中的示例是:

x= [ 511.31, 512.24, 571.77, 588.35, 657.08, 665.49, -1043.45, -1036.56,-969.39, -955.33]

我使用以下代码生成所有可能的对

Pairs=[(x[i],x[j]) for i in range(len(x)) for j in range(i+1, len(x))]

哪个给了我所有可能的对。现在,如果它们在阈值-25或+25范围内,我想将它们pairs分组,并相应地标记它们。 有关如何执行此操作的任何想法或建议?预先感谢

2 个答案:

答案 0 :(得分:1)

如果我正确理解了您的问题,则下面的代码应该可以解决问题。想法是生成一个字典,其键为平均值,并继续向其附加数据:

import numpy as np #I use numpy for the mean.

#Your threshold
threshold = 25 
#A dictionary will hold the relevant pairs
mylist = {}
for i in Pairs:
    
    #Check for the threshold and discard otherwise
    diff = abs(i[1]-i[0])
    
    if(diff < threshold):
        #Name of the entry in the dictionary
        entry = str('%d'%int(np.mean(i)))
        
        #If the entry already exists, append. Otherwise, create a container list
        if(entry in mylist):
            mylist[entry].append(i)
        else:
            mylist[entry] = [i]

这将导致以下输出:

{'-1040': [(-1043.45, -1036.56)],
 '-962': [(-969.39, -955.33)],
 '511': [(511.1, 511.31),
  (511.1, 512.24),
  (511.1, 512.35),
  (511.31, 512.24),
  (511.31, 512.35)],
 '512': [(511.1, 513.35),
  (511.31, 513.35),
  (512.24, 512.35),
  (512.24, 513.35),
  (512.35, 513.35)],
 '580': [(571.77, 588.35)],
 '661': [(657.08, 665.49)]}

答案 1 :(得分:0)

这应该是一种快速的方法:

import numpy as np
from scipy.spatial.distance import pdist

# Input data
x = np.array([511.31, 512.24, 571.77, 588.35, 657.08,
              665.49, -1043.45, -1036.56,-969.39, -955.33])
thres = 25.0
# Compute pairwise distances
# default distance metric is'euclidean' which
# would be equivalent but more expensive to compute
d = pdist(x[:, np.newaxis], 'cityblock')
# Find distances within threshold
d_idx = np.where(d <= thres)[0]
# Convert "condensed" distance indices to pair of indices
r = np.arange(len(x))
c = np.zeros_like(r, dtype=np.int32)
np.cumsum(r[:0:-1], out=c[1:])
i = np.searchsorted(c[1:], d_idx, side='right')
j = d_idx - c[i] + r[i] + 1
# Get pairs of values
v_i = x[i]
v_j = x[j]
# Find means
m = np.round((v_i + v_j) / 2).astype(np.int32)
# Print result
for idx in range(len(m)):
    print(f'{m[idx]}: ({v_i[idx]}, {v_j[idx]})')

输出

512: (511.31, 512.24)
580: (571.77, 588.35)
661: (657.08, 665.49)
-1040: (-1043.45, -1036.56)
-962: (-969.39, -955.33)