假设我有一个Python数组a=[3, 5, 2, 7, 5, 3, 6, 8, 4]
。我的目标是一次遍历这个数组3个元素,返回三个元素中前2个的平均值。
使用上面的数组,在我的迭代步骤中,前三个元素是[3, 5, 2]
,前两个元素的平均值是4.接下来的三个元素是[5, 2, 7]
和顶部的平均值2个元素是6.接下来的三个元素是[2, 7, 5]
,前两个元素的平均值是6. ...
因此,上述数组的结果为[4, 6, 6, 6, 5.5, 7, 7]
。
编写这样一个函数最好的方法是什么?
答案 0 :(得分:14)
您可以使用列表的某些奇特切片来操作元素的子集。只需抓住每个三元素子列表,排序找到前两个元素,然后找到简单平均值(也就是平均值)并将其添加到结果列表中。
def get_means(input_list):
means = []
for i in xrange(len(input_list)-2):
three_elements = input_list[i:i+3]
sum_top_two = sum(three_elements) - min(three_elements)
means.append(sum_top_two/2.0)
return means
您可以看到您的示例输入(和所需的结果),如下所示:
print(get_means([3, 5, 2, 7, 5, 3, 6, 8, 4]))
# [4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]
还有其他一些很好的答案会引入更多针对性能的答案,包括使用生成器避免大量内存列表的答案:https://stackoverflow.com/a/49001728/416500
答案 1 :(得分:12)
我相信将代码拆分为两部分。这里将获得滑动窗口,获得前2个元素,并计算平均值。最干净的方法是使用发电机
使用tee
,islice
和zip
创建窗口,对evamicur的答案略有不同:
def windowed_iterator(iterable, n=2):
iterators = itertools.tee(iterable, n)
iterators = (itertools.islice(it, i, None) for i, it in enumerate(iterators))
yield from zip(*iterators)
windows = windowed_iterator(iterable=a, n=3)
[(3, 5, 2), (5, 2, 7), (2, 7, 5), (7, 5, 3), (5, 3, 6), (3, 6, 8), (6, 8, 4)]
计算你可以使用其他答案中使用的任何方法的最高2的平均值,我认为heapq
on是最清楚的
from heapq import nlargest
top_n = map(lambda x: nlargest(2, x), windows)
或等效
top_n = (nlargest(2, i) for i in windows)
[[5, 3], [7, 5], [7, 5], [7, 5], [6, 5], [8, 6], [8, 6]]
from statistics import mean
means = map(mean, top_n)
[4, 6, 6, 6, 5.5, 7, 7]
答案 2 :(得分:8)
以下代码可满足您的需求:
[sum(sorted(a[i:i + 3])[-2:]) / 2 for i in range(len(a) - 2)]
鉴于您的a=[3, 5, 2, 7, 5, 3, 6, 8, 4]
,请返回:
[4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]
答案 3 :(得分:6)
itertools
有一个简洁的配方,可以从任何可迭代的项目中提取项目,而不仅仅是可索引的。您可以稍微调整它以提取三胞胎:
def tripletwise(iterable):
a, b, c = itertools.tee(iterable, 3)
next(b, None)
next(itertools.islice(c, 2, 2), None)
return zip(a, b, c)
使用它,您可以简化迭代所有三元组:
def windowed_means(iterable):
return [
(sum(window) - min(window)) / 2.0
for window in tripletwise(iterable)
]
答案 4 :(得分:3)
foslok的解决方案绝对没问题,但我想玩一下并用发电机制作一个版本。它只存储一个长度的deque(window_size) 当它遍历原始列表时,然后找到n_largest值并计算其平均值。
import itertools as it
from collections import deque
from heapq import nlargest
from statistics import mean
def windowed(iterable, n):
_iter = iter(iterable)
d = deque((it.islice(_iter, n)), maxlen=n)
yield tuple(d)
for i in _iter:
d.append(i)
yield tuple(d)
a = [3, 5, 2, 7, 5, 3, 6, 8, 4]
means = [mean(nlargest(2, w)) for w in windowed(a, 3)]
print(means)
结果:
[4, 6, 6, 6, 5.5, 7, 7]
因此,要更改元素数(窗口大小)或n个最大元素,只需更改相应函数的参数即可。这种方法也避免了切片的使用,因此可以更容易地应用于您不能或不想切片的迭代。
def deque_version(iterable, n, k):
means = (mean(nlargest(n, w)) for w in windowed(iterable, k))
for m in means:
pass
def tee_version(iterable, n, k):
means = (mean(nlargest(n, w)) for w in windowed_iterator(iterable, k))
for m in means:
pass
a = list(range(10**5))
n = 3
k = 2
print("n={} k={}".format(n, k))
print("Deque")
%timeit deque_version(a, n, k)
print("Tee")
%timeit tee_version(a, n, k)
n = 1000
k = 2
print("n={} k={}".format(n, k))
print("Deque")
%timeit deque_version(a, n, k)
print("Tee")
%timeit tee_version(a, n, k)
n = 50
k = 25
print("n={} k={}".format(n, k))
print("Deque")
%timeit deque_version(a, n, k)
print("Tee")
%timeit tee_version(a, n, k)
result:
n=3 k=2
Deque
1.28 s ± 3.07 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Tee
1.28 s ± 16.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
n=1000 k=2
Deque
1.28 s ± 8.72 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Tee
1.27 s ± 2.92 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
n=50 k=25
Deque
2.46 s ± 10.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Tee
2.47 s ± 2.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
显然,itertools tee vs deque并不重要。
答案 5 :(得分:1)
使用列表理解
from statistics import mean
yourList=[3, 5, 2, 7, 5, 3, 6, 8, 4]
k = 3
listYouWant = [mean(x) for x in [y[1:k] for y in [sorted(yourList[z:z+k]) for z in xrange(len(yourList)) if z < len(yourList) -(k-1)]]]
产量[4.0,6.0,6.0,6.0,5.5,7.0,7.0]
答案 6 :(得分:1)
你可以试试这个!
>>> a
[3, 5, 2, 7, 5, 3, 6, 8, 4]
>>> n
3
>>> m
2
>>> [sum(sorted(a[i*n:i*n+n])[1:])/m for i in range(len(a)/n)]
[4, 6, 7]
即,
>>> a
[3, 5, 2, 7, 5, 3, 6, 8, 4]
>>> n
3
>>> [i for i in range(len(a)/n)]
[0, 1, 2]
>>> m=2
>>> [a[i*n:i*n+n] for i in range(len(a)/n)]
[[3, 5, 2], [7, 5, 3], [6, 8, 4]]
>>> [sorted(a[i*n:i*n+n]) for i in range(len(a)/n)]
[[2, 3, 5], [3, 5, 7], [4, 6, 8]]
>>> [sorted(a[i*n:i*n+n])[1:] for i in range(len(a)/n)]
[[3, 5], [5, 7], [6, 8]]
>>> [sum(sorted(a[i*n:i*n+n])[1:]) for i in range(len(a)/n)]
[8, 12, 14]
>>> [sum(sorted(a[i*n:i*n+n])[1:])/m for i in range(len(a)/n)]
[4, 6, 7]
答案 7 :(得分:1)
a=[3, 5, 2, 7, 5, 3, 6, 8, 4]
mean_list = [
mean(x)
for x in [
y[1:3]
for y in [
sorted(a[z:z+3])
for z in range(len(a))
if z < len(a) -2
]
]
]
答案 8 :(得分:1)
您也可以从发电机的角度来看待它:
a=[3, 5, 2, 7, 5, 3, 6, 8, 4]
def gen_list():
for i in range(0, len(a) - 3):
yield sorted(a[i:i + 3], reverse=True)
apply_division = map(lambda x: sum(x[:2]) / len(x[:2]), gen_list())
if __name__=="__main__":
result = list(apply_division)
print(result)
[4.0, 6.0, 6.0, 6.0, 5.5, 7.0]
答案 9 :(得分:1)
你需要一个滑动窗口迭代器以及最多两个元素的平均值。我将尝试生成一个通用解决方案,该解决方案可以与大小为n
的滑动窗口一起使用,其中n是任何正实数。
from itertools import islice
def calculate_means(items, window_length=3):
stop_seq = window_length - 1
sliding_window = [sorted(islice(items[x:],window_length),reverse=True) for x in range(len(items)-stop_seq)]
return [sum(a[:stop_seq])/stop_seq for a in sliding_window]
>>> calculate_means([3, 5, 2, 7, 5, 3, 6, 8, 4])
>>> [4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]
答案 10 :(得分:1)
记录中,这是一个功能版本:
>>> f=lambda values:[] if len(values)<=2 else [(sum(values[:3])-min(values[:3]))/2]+f(values[1:])
>>> f([3, 5, 2, 7, 5, 3, 6, 8, 4])
[4.0, 6.0, 6.0, 6.0, 5.5, 7.0, 7.0]
>>> f([3, 5, 2])
[4.0]
>>> f([3, 5])
[]
答案 11 :(得分:1)
使用sliding window algorithm和第三方more_itertools.windowed
工具:
import statistics as stats
import more_itertools as mit
lst = [3, 5, 2, 7, 5, 3, 6, 8, 4]
[stats.mean(sorted(w)[1:]) for w in mit.windowed(lst, 3)]
# [4, 6, 6, 6, 5.5, 7, 7]
另见@MaartenFabré的related post。
答案 12 :(得分:0)
要对三个数字进行排序,我们最多需要三次比较。要找到三个数字中最低的,我们只需要两个通过quickselect。我们也不需要制作任何子列表副本:
a,b,c
a < b
? (a < c ? a : c)
: (b < c ? b : c)
def f(A):
means = [None] * (len(A) - 2)
for i in xrange(len(A) - 2):
if A[i] < A[i+1]:
means[i] = (A[i+1] + A[i+2]) / 2.0 if A[i] < A[i+2] else (A[i] + A[i+1]) / 2.0
else:
means[i] = (A[i] + A[i+2]) / 2.0 if A[i+1] < A[i+2] else (A[i] + A[i+1]) / 2.0
return means
print f([3, 5, 2, 7, 5, 3, 6, 8, 4])