使用pandas
和boost::accumulators
进行统计计算时会得到不同的结果,并且不确定为什么。
下面有一个简单的示例,其中使用熊猫来计算某些收益的均值和方差
import pandas
vals = [ 1, 1, 2, 1, 3, 2, 3, 4, 6, 3, 2, 1 ]
rets = pandas.Series(vals).pct_change()
print(f'count: {len(rets)}')
print(f'mean: {rets.mean()}')
print(f'variance: {rets.var()}')
此输出为:
count: 12 mean: 0.19696969696969696 variance: 0.6156565656565657
我正在使用boost::accumulators
进行C ++的统计计算
#include <iostream>
#include <iomanip>
#include <cmath>
#include <boost/accumulators/accumulators.hpp>
#include <boost/accumulators/statistics/stats.hpp>
#include <boost/accumulators/statistics/count.hpp>
#include <boost/accumulators/statistics/mean.hpp>
#include <boost/accumulators/statistics/variance.hpp>
namespace acc = boost::accumulators;
int main()
{
acc::accumulator_set<double, acc::stats<acc::tag::count,
acc::tag::mean,
acc::tag::variance>> stats;
double prev = NAN;
for (double val : { 1, 1, 2, 1, 3, 2, 3, 4, 6, 3, 2, 1 })
{
const double ret = (val - prev) / prev;
stats(std::isnan(ret) ? 0 : ret);
prev = val;
}
std::cout << std::setprecision(16)
<< "count: " << acc::count(stats) << '\n'
<< "mean: " << acc::mean(stats) << '\n'
<< "variance: " << acc::variance(stats) << '\n';
return 0;
}
此输出为:
count: 12 mean: 0.1805555555555556 variance: 0.5160108024691359
答案 0 :(得分:1)
在大熊猫中,通过defualt执行NNP NNP
时,它将删除nan
列,如果我们将mean
填充为0,则输出相同,因为您执行nan
,第一项应该是NaN
pct_change
关于rets.mean()
Out[67]: 0.19696969696969696
rets.fillna(0).mean()
Out[69]: 0.18055555555555555
,将自由度设为0
var