我有以下数据框:
url='https://raw.githubusercontent.com/108michael/ms_thesis/master/pacs.can.cl.abbridged'
df=pd.read_csv('https://raw.githubusercontent.com/108michael/ms_thesis/master/pacs.can.cl.abbridged')
df= df.set_index(pd.to_datetime(df['date']), inplace=False)
df.head(3)
cycle pacid cid amount date catcode type di feccandid amtsum
date
2010-10-13 2010 C00000901 N00031317 1000 2010-10-13 B2000 24K D H0FL19080 3000
2009-03-23 2010 C00082917 N00027464 5000 2009-03-23 B1000 24K D H6IA01098 3500
2009-05-13 2010 C00034405 N00024875 1000 2009-05-13 A5200 24K D H2IL08088 2000
下面我执行分组:
df['amtsum'] = df.groupby([pd.Grouper(level='date', freq='A'), 'catcode',\
'type', 'pacid', 'di', 'feccandid']).amount.transform('sum')
cycle pacid cid amount date catcode type di feccandid amtsum
date
2010-10-13 2010 C00000901 N00031317 1000 2010-10-13 B2000 24K D H0FL19080 3000
2009-03-23 2010 C00082917 N00027464 5000 2009-03-23 B1000 24K D H6IA01098 3500
2009-05-13 2010 C00034405 N00024875 1000 2009-05-13 A5200 24K D H2IL08088 2000
我希望date index
在年底结束,例如2010-12-31
。我有这个问题before,我得到了一个有效的解决方案。不幸的是,现在我正在重新审视我的代码的这一部分,解决方案已不再有效。我也尝试了以下内容:
df['amtsum'] = df.groupby([pd.TimeGrouper('12M', closed='left'), 'catcode',\
'type', 'pacid', 'di', 'feccandid']).amount.transform('sum')
df.head(3)
cycle pacid cid amount date catcode type di feccandid amtsum
date
2010-10-13 2010 C00000901 N00031317 1000 2010-10-13 B2000 24K D H0FL19080 1000
2009-03-23 2010 C00082917 N00027464 5000 2009-03-23 B1000 24K D H6IA01098 3500
2009-05-13 2010 C00034405 N00024875 1000 2009-05-13 A5200 24K D H2IL08088 1000
但结果仍然不是我想要的。 有没有人对此有所了解?
答案 0 :(得分:1)
import pandas as pd
import datetime as dt
import numpy as np
index= pd.date_range(start=dt.date(2014,02,04), periods=200, freq='1M')
data = np.random.random(200)
df = pd.DataFrame(data, index=index, columns=["col1"])
group = pd.TimeGrouper('A')
grouped = df.groupby(group)
for key, g in grouped:
print key
example = grouped.mean()
print example.head(3)
给出:
>>
2014-12-31 00:00:00
2015-12-31 00:00:00
2016-12-31 00:00:00
2017-12-31 00:00:00
2018-12-31 00:00:00
2019-12-31 00:00:00
....
col1
2014-12-31 0.602693
2015-12-31 0.427651
2016-12-31 0.630363
您可以随时迭代这些组并手动汇总结果。但是,仔细检查一下,您似乎正在使用带有时间分组器的“12M”,而您需要“A”。