Python求助 我有一个包含以下元素的列表
[[**287.5** 235.5 24.3]
[**287.5** 297.5 24.3]
[**287.5** 359.5 24.3]
[ 56.5 151.5 25.4]]
我想删除[56.5 151.5 25.4],因为56.5与其他人(287.5)存在巨大差异。我想回到
[[287.5 235.5 24.3]
[287.5 297.5 24.3]
[287.5 359.5 24.3]]
我已经想了几个小时,但没有想出一个好主意,任何人都可以帮忙吗?
答案 0 :(得分:0)
如评论中所述,目前尚不清楚所要求的内容。您可能会问“如何选择三个列表的组合,最大限度地减少每列的variances?”
这是Python 3中的一种方式:
<强>鉴于强>
import itertools as it
import statistics as stats
a = [[287.5, 235.5, 24.3],
[287.5, 297.5, 24.3],
[287.5, 359.5, 24.3],
[ 56.5, 151.5, 25.4]]
<强>代码强>
def sum_of_variances(combs):
"""Return the sum of variances for the columns in each combination."""
cols_per_combs = [list(zip(*x)) for x in combs]
return [sum([stats.variance(col) for col in cols]) for cols in cols_per_combs]
def optimal_combination(lst):
"""Return the combination that minimizes the columnar variances."""
combs = [x for x in it.combinations(lst, 3)]
summed_vars = sum_of_variances(combs)
idx = min(enumerate(summed_vars), key=lambda x: x[1])[0]
return combs[idx]
optimal_combination(a)
# ([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [287.5, 359.5, 24.3])
<强>详情
查找三个列表的所有组合:
>>> combs = [x for x in it.combinations(a, 3)]
>>> combs
[([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [287.5, 359.5, 24.3]),
([287.5, 235.5, 24.3], [287.5, 297.5, 24.3], [56.5, 151.5, 25.4]),
([287.5, 235.5, 24.3], [287.5, 359.5, 24.3], [56.5, 151.5, 25.4]),
([287.5, 297.5, 24.3], [287.5, 359.5, 24.3], [56.5, 151.5, 25.4])]
通过压缩来查看所有组合的列:
>>> cols_per_combs = [list(zip(*x)) for x in combs]
>>> cols_per_combs
[[(287.5, 287.5, 287.5), (235.5, 297.5, 359.5), (24.3, 24.3, 24.3)],
[(287.5, 287.5, 56.5), (235.5, 297.5, 151.5), (24.3, 24.3, 25.4)],
[(287.5, 287.5, 56.5), (235.5, 359.5, 151.5), (24.3, 24.3, 25.4)],
[(287.5, 287.5, 56.5), (297.5, 359.5, 151.5), (24.3, 24.3, 25.4)]]
我们计算列的方差,以衡量每列中值的差异。请注意,第一个选项(索引0
)显示差异最小的列:
>>> variance_per_cols = [[stats.variance(col) for col in cols] for cols in cols_per_combs]
>>> variance_per_cols
[[0.0, 3844.0, 0.0],
[17787.0, 5369.333333333333, 0.4033333333333317],
[17787.0, 10949.333333333334, 0.4033333333333317],
[17787.0, 11404.0, 0.4033333333333317]]
如果我们总结这些差异,我们仍然可以看到哪种组合最小化这些差异,即索引0
:
>>> summed_vars = sum_of_variances(combs)
>>> summed_vars
[3844.0, 23156.736666666664, 28736.736666666668, 29191.403333333332]
optimal_combination()
返回最小化求和方差的组合,即索引combs[0]
。