如何在python中基于年份系列化

时间:2018-05-12 03:58:08

标签: python python-3.x

this my dataset called t

year    roe
33  1987    0.114370
34  1988    0.141086
35  1989    0.144621
36  1990    0.135348
37  1991    0.076381

我试图根据每年的价格来衡量roe,但是所有的价值都变成了NaN,就像这样:

from scipy.stats.mstats import winsorize
grouped=t.groupby('year')
t['roe_w']=grouped['roe'].apply(winsorize,limits=[0.01,0.01])

results

t.roe_w.head()
33    NaN
34    NaN
35    NaN
36    NaN
37    NaN
Name: roe_w, dtype: object

我尝试定义要应用的新函数,但又得到了相同的结果。我不能直接winsorize groupby对象

如果我这样做,我不知道如何将结果添加到原始数据帧...

t.groupby('year').roe.apply(winsorize,limits=[0.01,0.01])
Out[57]: 
year
1965    [0.14415390338, 0.117462079171, 0.128658847171...
1966    [0.150060493701, nan, 0.12508992087, 0.1676523...
1967    [0.169752998571, nan, 0.128173648284, 0.117999...
1968    [0.120201520849, nan, 0.162525114893, 0.137850...
1969    [0.112350791976, 0.114198765054, 0.18420285951...
1970    [0.0939597998646, 0.181250926338, 0.0782947478...
1971    [0.0790746582545, 0.184407887041, 0.0254160825...
1972    [0.0930312074128, 0.201296434986, 0.0800409603...

我试过这个功能,效果很好,但结果与stata略有不同。所以我仍然必须使用scipy的winsorize来获得与stata相同的结果。

def winsorize_series(se):  
    q = se.quantile([0.01, 0.99])  
    if isinstance(q, pd.Series) and len(q) == 2:  
        se[se < q.iloc[0]] = q.iloc[0]  
        se[se > q.iloc[1]] = q.iloc[1]  
    return se  

HELP!救命! QAQ

0 个答案:

没有答案