我有一个类似于以下内容的数据框:
detaildate detailquantity
0 5/6/2014 8550
1 5/8/2014 0
2 3/3/2015 -3250
3 4/14/2015 -3250
4 5/19/2015 3250
5 5/20/2015 -1200
6 2/22/2016 40000
7 4/23/2016 -4500
8 5/23/2016 -2500
9 5/30/2016 -5000
10 4/3/2017 -4750
11 6/5/2017 -2000
现在,我想按某个时间范围对数据进行分组。例如,如果我每年对其进行分组,则我希望得到以下结果:
detaildate detailquantity
0 5/6/2014 8550
1 5/8/2014 0
detaildate detailquantity
0 3/3/2015 -3250
1 4/14/2015 -3250
2 5/19/2015 3250
3 5/20/2015 -1200
detaildate detailquantity
0 2/22/2016 40000
1 4/23/2016 -4500
2 5/23/2016 -2500
3 5/30/2016 -5000
detaildate detailquantity
0 4/3/2017 -4750
1 6/5/2017 -2000
我为此编写了以下代码:
S = pd.to_datetime(df.detaildate)
for i, g in df.groupby([(S - S[0]).astype('timedelta64[Y]')]):
print (g.reset_index(drop=True))
但是,不是按日历年分组,而是从开始日期按1年分组。我得到的结果是:
detaildate detailquantity
0 5/6/2014 8550
1 5/8/2014 0
2 3/3/2015 -3250
3 4/14/2015 -3250
detaildate detailquantity
0 5/19/2015 3250
1 5/20/2015 -1200
2 2/22/2016 40000
3 4/23/2016 -4500
detaildate detailquantity
0 5/23/2016 -2500
1 5/30/2016 -5000
2 4/3/2017 -4750
detaildate detailquantity
0 6/5/2017 -2000
如何解决此问题?
此外,我想用一种方法编写上面的代码,并将时间范围(M,Y,W,D)保留为参数。如下所示:
def groupData(df,timeFrame):
S = pd.to_datetime(df.detaildate)
#pass timeFrame as parameter below instead of hardcoded Y
for i, g in df.groupby([(S - S[0]).astype('timedelta64[Y]')]):
print (g.reset_index(drop=True))
如何用我方法的参数 timeFrame 替换上面的硬编码Y?
答案 0 :(得分:0)
在"rootView": {
"viewName": "namespace.view.App",
"type": "XML",
"id": "app"
}
下使用series.dt.year()
:
groupby
#df.detaildate=pd.to_datetime(df.detaildate)
for i,g in df.groupby(df.detaildate.dt.year):
print(g.reset_index(drop=True))