我正在尝试在组中添加两行,例如:
ID DATE NUMBER
1 2012-10-11 5
1 2012-10-12 4
1 2012-10-13 3
2 2012-10-11 2
2 2012-10-12 1
2 2012-10-13 6
我只想将2012-10-13添加到2012-10-12。然后删除2012-10-13。最终结果:
ID DATE NUMBER
1 2012-10-11 5
1 2012-10-12 7 (4+3)
2 2012-10-11 2
2 2012-10-12 7 (6+1)
答案 0 :(得分:3)
replace
换出groupby
df.replace({'DATE': {'2012-10-13': '2012-10-12'}}) \
.groupby(['ID', 'DATE'], as_index=False).sum()
ID DATE NUMBER
0 1 2012-10-11 5
1 1 2012-10-12 7
2 2 2012-10-11 2
3 2 2012-10-12 7
答案 1 :(得分:1)
Import pandas as pd
## First change the date
for i in range(len(df)):
if df.loc[i,'DATE'] == "2012-10-13":
df.loc[i,'DATE'] = "2012-10-12"
## Then do a groupby sum
df = pd.DataFrame({'SUM' : df.groupby(['ID','DATE'])['NUMBER'].sum()})
我的输出:
SUM
ID DATE
1 2012-10-11 5
2012-10-12 7
2 2012-10-11 2
2012-10-12 7
答案 2 :(得分:0)
警告:上面的代码可以完成您的工作,但并不能说明问题!
# I want to groupby ID excluding the line for 2012-10-11
df1 = df.loc[df.DATE != '2012-10-11']
# 1 - df1.groupby('ID').sum() -> I groupby ID to get the sum
# 2 - df1.drop('NUMBER', axis=1) -> I drop the col NUMBER to avoid overlaping columns
# 3 I merge the to df to get the sum value for every initial lines
df1 = df1.drop('NUMBER', axis=1).merge(df1.groupby('ID').sum(), on='ID')
# I get back the ligne for 2012-10-11
df1 = df1.append(df.loc[df.DATE == '2012-10-11'], sort=True)
df1 = df1.sort_values(['ID', 'DATE'])
# I delete the line I don't want
df1 = df1.loc[df1.DATE != '2012-10-13']
print(df1)