这让我感到惊讶......为了说明我使用这个小代码来计算1M随机数的平均值和中位数:
import numpy as np
import statistics as st
import time
listofrandnum = np.random.rand(1000000,)
t = time.time()
print('mean is:', st.mean(listofrandnum))
print('time to calc mean:', time.time()-t)
print('\n')
t = time.time()
print('median is:', st.median(listofrandnum))
print('time to calc median:', time.time()-t)
结果如下:
mean is: 0.499866595037
time to calc mean: 2.0767598152160645
median is: 0.499721597395
time to calc median: 0.9687695503234863
我的问题:为什么平均值比中位数慢?中值需要一些排序算法(即比较),而均值需要求和。总和是否比比较慢?
我将非常感谢您对此的见解。
答案 0 :(得分:8)
SELECT t1.vote AS 'Candidate', CAST(SUM(t2.`weight`) AS Decimal(6,2)) as 'Votes'
FROM table1 AS t1 RIGHT JOIN (
SELECT voter, 1.0 / CAST( COUNT(*) as Decimal) as `weight`
FROM table1
GROUP BY voter
) AS t2 ON t1.voter = t2.voter
GROUP BY t1.vote
ORDER BY t1.vote;
不属于NumPy。它是一个Python标准库模块,具有相当不同的设计理念;它可以不惜一切代价获得准确性,即使对于异常输入数据类型和极差条件输入也是如此。以statistics
模块执行的方式执行求和实际上非常昂贵,而不是执行排序。
如果你想在NumPy数组上获得有效的均值或中位数,请使用NumPy例程:
statistics
如果您希望查看numpy.mean(whatever)
numpy.median(whatever)
模块所需的昂贵工作以获得简单的金额,您可以查看source code:
statistics