在熊猫中提到的按时间框架分组数据框架

时间:2019-06-19 11:39:58

标签: python pandas numpy

我有一个类似于以下内容的数据框:

   detaildate   detailquantity
0   5/6/2014    8550
1   5/8/2014    0
2   3/3/2015    -3250
3   4/14/2015   -3250
4   5/19/2015   3250
5   5/20/2015   -1200
6   2/22/2016   40000
7   4/23/2016   -4500
8   5/23/2016   -2500
9   5/30/2016   -5000
10  4/3/2017    -4750
11  6/5/2017    -2000

现在,我想按某个时间范围对数据进行分组。例如,如果我每年对其进行分组,则我希望得到以下结果:

   detaildate   detailquantity
0   5/6/2014    8550
1   5/8/2014    0
   detaildate   detailquantity
0   3/3/2015    -3250
1   4/14/2015   -3250
2   5/19/2015   3250
3   5/20/2015   -1200
   detaildate   detailquantity
0   2/22/2016   40000
1   4/23/2016   -4500
2   5/23/2016   -2500
3   5/30/2016   -5000
   detaildate   detailquantity
0   4/3/2017    -4750
1   6/5/2017    -2000

我为此编写了以下代码:

S = pd.to_datetime(df.detaildate)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[Y]')]):
    print (g.reset_index(drop=True))

但是,不是按日历年分组,而是从开始日期按1年分组。我得到的结果是:

   detaildate   detailquantity
0   5/6/2014    8550
1   5/8/2014    0
2   3/3/2015    -3250
3   4/14/2015   -3250
   detaildate   detailquantity
0   5/19/2015   3250
1   5/20/2015   -1200
2   2/22/2016   40000
3   4/23/2016   -4500
   detaildate   detailquantity
0   5/23/2016   -2500
1   5/30/2016   -5000
2   4/3/2017    -4750
   detaildate   detailquantity
0   6/5/2017    -2000

如何解决此问题?

此外,我想用一种方法编写上面的代码,并将时间范围(M,Y,W,D)保留为参数。如下所示:

def groupData(df,timeFrame):
    S = pd.to_datetime(df.detaildate)
    #pass timeFrame as parameter below instead of hardcoded Y
    for i, g in df.groupby([(S - S[0]).astype('timedelta64[Y]')]):
        print (g.reset_index(drop=True))

如何用我方法的参数 timeFrame 替换上面的硬编码Y?

1 个答案:

答案 0 :(得分:0)

"rootView": { "viewName": "namespace.view.App", "type": "XML", "id": "app" } 下使用series.dt.year()

groupby

#df.detaildate=pd.to_datetime(df.detaildate)
for i,g in df.groupby(df.detaildate.dt.year):
    print(g.reset_index(drop=True))