熊猫只在组内添加两行

时间:2018-11-02 12:15:23

标签: python python-3.x pandas pandas-groupby

我正在尝试在组中添加两行,例如:

ID   DATE   NUMBER 
1   2012-10-11   5 
1   2012-10-12   4 
1   2012-10-13   3 
2   2012-10-11   2 
2   2012-10-12   1 
2   2012-10-13   6

我只想将2012-10-13添加到2012-10-12。然后删除2012-10-13。最终结果:

ID   DATE   NUMBER 
1   2012-10-11   5 
1   2012-10-12   7 (4+3) 
2   2012-10-11   2 
2   2012-10-12   7 (6+1)

3 个答案:

答案 0 :(得分:3)

特定日期等效

  • 使用字典指定等效项
  • 使用replace换出
  • 正常使用groupby

df.replace({'DATE': {'2012-10-13': '2012-10-12'}}) \
  .groupby(['ID', 'DATE'], as_index=False).sum()

   ID        DATE  NUMBER
0   1  2012-10-11       5
1   1  2012-10-12       7
2   2  2012-10-11       2
3   2  2012-10-12       7

答案 1 :(得分:1)

Import pandas as pd 

## First change the date

for i in range(len(df)): 
    if df.loc[i,'DATE'] == "2012-10-13":
        df.loc[i,'DATE'] = "2012-10-12"

## Then do a groupby sum 

df = pd.DataFrame({'SUM' : df.groupby(['ID','DATE'])['NUMBER'].sum()})

我的输出:

               SUM
ID DATE           
1  2012-10-11    5
   2012-10-12    7
2  2012-10-11    2
   2012-10-12    7

答案 2 :(得分:0)

警告:上面的代码可以完成您的工作,但并不能说明问题!

# I want to groupby ID excluding the line for 2012-10-11
df1 = df.loc[df.DATE != '2012-10-11']

# 1 - df1.groupby('ID').sum() -> I groupby ID to get the sum
# 2 - df1.drop('NUMBER', axis=1) -> I drop the col NUMBER to avoid overlaping columns
# 3 I merge the to df to get the sum value for every initial lines
df1 = df1.drop('NUMBER', axis=1).merge(df1.groupby('ID').sum(), on='ID')

# I get back the ligne for 2012-10-11
df1 = df1.append(df.loc[df.DATE == '2012-10-11'], sort=True)
df1 = df1.sort_values(['ID', 'DATE'])

# I delete the line I don't want
df1 = df1.loc[df1.DATE != '2012-10-13']

print(df1)