Question

我有一个当前看起来像这样的数据框：

 df = pd.DataFrame(np.array([[1, 2, 'Apples 20pk ABC123', 4, 5], [6, 7, 
 'Bananas 20pk ABC123', 9, 0], [1, 2, 'Apples 20pk ABC123', 8, 9]]), 
 columns= ['Serial #', 'Branch ID', 'Info', 'Value1', 'Value2'])

               Serial#  Branch ID    Info                  Value1   Value2
        0         1       2          Apples 20pk ABC123       4        5
        1         6       7          Bananas 20pk ABC123      9        0
        2         1       2          Apples 20pk ABC123       8        9

我想对以下列进行分组：序列号，分支ID和信息，以获取列“值1”和“值2”的累积总和。本质上，如果Serial＃，BranchID和Info列下的值匹配，它将仅执行“求和”。

我以为我可以做下面的事情，但是我只得到一个仅返回索引的df：

df[df.columns.difference(['Serial#', 'Branch ID', 
'Info'])].apply(pd.to_numeric, errors='coerce')
# converting other columns(not the three groupby columns) to numeric values 

df_cumulative = df.groupby(['Serial#', 'Branch ID', 
'Info']).cumsum().add_suffix('_cumulative')

理想的最终结果是：

           Serial#  Branch ID   Info                 Value1_cum  Value2_cum
        0     1       2         Apples 20pk ABC123       4        5
        1     6       7         Bananas 20pk ABC123      9        0
        2     1       2         Apples 20pk ABC123       12       14

谢谢！

Answer 1

在您的情况下，您需要将值覆盖回df，然后

l=['Serial #', 'Branch ID', 
'Info']
df=df.drop(l,1).apply(pd.to_numeric, errors='coerce').combine_first(df)
df=df.set_index(l).groupby(level=[0,1,2]).cumsum().add_suffix('_cumulative').reset_index()
df
df
   Serial #  Branch ID  ... Value1_cumulative  Value2_cumulative
0       1.0        2.0  ...                 4                  5
1       6.0        7.0  ...                 9                  0
2       1.0        2.0  ...                12                 14
[3 rows x 5 columns]

根据匹配的索引值对多列执行求和

1 个答案: