我正在计算一个coskew矩阵,并希望用skew
方法内置的pandas仔细检查我的计算。我无法调和大熊猫如何进行计算。
将我的系列定义为:
import pandas as pd
series = pd.Series(
{0: -0.051917457635120283,
1: -0.070071606515280632,
2: -0.11204865874074735,
3: -0.14679988245503134,
4: -0.088062467095565145,
5: 0.17579741198527793,
6: -0.10765856028420773,
7: -0.11971470229167547,
8: -0.15169210769159247,
9: -0.038616800990881606,
10: 0.16988162977411481,
11: 0.092999418364443032}
)
我比较了以下计算,并期望它们是相同的。
series.skew()
1.1119637586658944
(((series - series.mean()) / series.std(ddof=0)) ** 3).mean()
0.967840223081231
这是显着不同的。我以为它可能是Fisher-Pearson coefficient。所以我做了:
n = len(series)
skew = series.sub(series.mean()).div(series.std(ddof=0)).apply(lambda x: x ** 3).mean()
skew * (n * (n - 1)) ** 0.5 / (n - 1)
1.0108761442417222
仍然相当多。
pandas如何计算偏斜?
答案 0 :(得分:7)
我发现参数bias=False
的{{3}}返回相等的输出,因此我认为pandas skew
默认为bias=False
:
偏见:bool如果为假,则对统计偏差进行校正。
import pandas as pd
import scipy.stats.stats as stats
series = pd.Series(
{0: -0.051917457635120283,
1: -0.070071606515280632,
2: -0.11204865874074735,
3: -0.14679988245503134,
4: -0.088062467095565145,
5: 0.17579741198527793,
6: -0.10765856028420773,
7: -0.11971470229167547,
8: -0.15169210769159247,
9: -0.038616800990881606,
10: 0.16988162977411481,
11: 0.092999418364443032}
)
print (series.skew())
1.11196375867
print (stats.skew(series, bias=False))
1.1119637586658944
不确定100%,但我认为我在scipy.stats.skew
来自code
if not bias:
can_correct = (n > 2) & (m2 > 0)
if can_correct.any():
m2 = np.extract(can_correct, m2)
m3 = np.extract(can_correct, m3)
nval = ma.sqrt((n-1.0)*n)/(n-2.0)*m3/m2**1.5
np.place(vals, can_correct, nval)
return vals
调整为(n * (n - 1)) ** 0.5 / (n - 2)
而非(n * (n - 1)) ** 0.5 / (n - 1)