删除列表中与其他元素有很大差异的元素

时间:2018-02-01 07:21:48

标签: python list filtering

Python求助 我有一个包含以下元素的列表

[[**287.5** 235.5  24.3]
  [**287.5** 297.5  24.3]
  [**287.5** 359.5  24.3]
  [ 56.5 151.5  25.4]]

我想删除[56.5 151.5 25.4],因为56.5与其他人(287.5)存在巨大差异。我想回到

[[287.5 235.5  24.3]
  [287.5 297.5  24.3]
  [287.5 359.5  24.3]]

我已经想了几个小时,但没有想出一个好主意,任何人都可以帮忙吗?

1 个答案:

答案 0 :(得分:0)

如评论中所述,目前尚不清楚所要求的内容。您可能会问“如何选择三个列表的组合,最大限度地减少每列的variances?”

这是Python 3中的一种方式:

<强>鉴于

import itertools as it
import statistics as stats


a = [[287.5, 235.5,  24.3],
     [287.5, 297.5,  24.3],
     [287.5, 359.5,  24.3],
     [ 56.5, 151.5,  25.4]]

<强>代码

def sum_of_variances(combs):
    """Return the sum of variances for the columns in each combination."""
    cols_per_combs = [list(zip(*x)) for x in combs]
    return [sum([stats.variance(col) for col in cols]) for cols in cols_per_combs]


def optimal_combination(lst):
    """Return the combination that minimizes the columnar variances."""
    combs = [x for x in it.combinations(lst, 3)]
    summed_vars = sum_of_variances(combs)
    idx = min(enumerate(summed_vars), key=lambda x: x[1])[0]
    return combs[idx]


optimal_combination(a)
# ([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [287.5, 359.5, 24.3])

<强>详情

查找三个列表的所有组合:

>>> combs = [x for x in it.combinations(a, 3)]
>>> combs
[([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [287.5, 359.5, 24.3]),
 ([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [56.5, 151.5, 25.4]),
 ([287.5, 235.5, 24.3], [287.5, 359.5, 24.3], [56.5, 151.5, 25.4]),
 ([287.5, 297.5, 24.3], [287.5, 359.5, 24.3], [56.5, 151.5, 25.4])]

通过压缩来查看所有组合的列:

>>> cols_per_combs = [list(zip(*x)) for x in combs]
>>> cols_per_combs
[[(287.5, 287.5, 287.5), (235.5, 297.5, 359.5), (24.3, 24.3, 24.3)],
 [(287.5, 287.5, 56.5), (235.5, 297.5, 151.5), (24.3, 24.3, 25.4)],
 [(287.5, 287.5, 56.5), (235.5, 359.5, 151.5), (24.3, 24.3, 25.4)],
 [(287.5, 287.5, 56.5), (297.5, 359.5, 151.5), (24.3, 24.3, 25.4)]]

我们计算列的方差,以衡量每列中值的差异。请注意,第一个选项(索引0)显示差异最小的列:

>>> variance_per_cols = [[stats.variance(col) for col in cols] for cols in cols_per_combs]
>>> variance_per_cols
[[0.0, 3844.0, 0.0],
 [17787.0, 5369.333333333333, 0.4033333333333317],
 [17787.0, 10949.333333333334, 0.4033333333333317],
 [17787.0, 11404.0, 0.4033333333333317]]

如果我们总结这些差异,我们仍然可以看到哪种组合最小化这些差异,即索引0

>>> summed_vars = sum_of_variances(combs)
>>> summed_vars
[3844.0, 23156.736666666664, 28736.736666666668, 29191.403333333332]

optimal_combination()返回最小化求和方差的组合,即索引combs[0]