大熊猫的累积分组

时间:2019-01-20 11:38:51

标签: python pandas dataframe

我有一个这样的数据框:

s = RfcConnection(sysrfc="QE2")
result = s.delete_to(lgnum="220", tanum="9592250", cancl="X", commit_work="X")

def delete_to(self, lgnum=None, tanum=None, solex=None, cancl=None, subst=None, qname=username, update_task=None, commit_work=None, t_ltap_cancl=None):
    return_msg = None
    assert (lgnum is not None and tanum is not None), "Warehouse number as lgnum and Transfer Order as tanum are required for function delete_to"
    if cancl is None:
        cancl = "X"
    try:
        if t_ltap_cancl is None:
            return_msg = self.conn.call("L_TO_CANCEL",
                                        I_LGNUM=lgnum,
                                        I_TANUM=pad(tanum, 10, "0"),
                                        I_CANCL=cancl)
        elif t_ltap_cancl is not None:
            return_msg = self.conn.call("L_TO_CANCEL",
                                        I_LGNUM=lgnum,
                                        I_TANUM=pad(tanum, 10, "0"),
                                        T_LTAP_CANCL=t_ltap_cancl)
    except pyrfc._exception.ABAPApplicationError as e:
        if e.msg_class == "L3" and e.msg_number == "354":
            return_msg = self.get_error_code(Language="EN", Area=e.msg_class, Message=e.msg_number)[0][0].replace("&", "{}".format(tanum))
        else:
            return_msg = self.get_error_code(Language="EN", Area=e.msg_class, Message=e.msg_number)
    except pyrfc._exception.ABAPRuntimeError as e:
        if e.msg_class == "L3" and e.msg_number == "037":
            return_msg = self.get_error_code(Language="EN", Area=e.msg_class, Message=e.msg_number)[0][0].replace("&", "{}".format(e.msg_v1))
        else:
            return_msg = self.get_error_code(Language="EN", Area=e.msg_class, Message=e.msg_number)
    except Exception as e:
        return_msg = e
    return return_msg

我将沿名称和日期进行累计,我的意思是,此示例的预期结果将是:

df = 
     name  amount  date
 0     A     10      1
 1     B     15      1
 2     A      5      2
 3     C      7      3
 4     A      8      4
 5     B     10      4
 6     C     11      4

我想显示日期列所表示的时间段内的累计值,例如,对于A,其在周期1中的值为10,在2中为5,在3中为0(因为它不会出现),而在4中为8,因此在df_result中显示了累加。 C直到周期3才出现,因为它直到那个周期才值

我尝试了groupby,cumsum甚至stack的不同组合,但是我无法实现任何接近目标的方法。

1 个答案:

答案 0 :(得分:1)

查看是否有帮助:

>>> df.groupby(by=['name','date']).sum().groupby(level=[0]).cumsum().reset_index()
  name  date  amount
0    A     1      10
1    A     2      15
2    A     4      23
3    B     1      15
4    B     4      25
5    C     3       7
6    C     4      18

另一个回答如@Jon在评论中所述,其枢轴吸引您关闭显示的内容。

>>> df = df.pivot('date', 'name', 'amount').fillna(0).stack().groupby(level=1).cumsum().astype('int')[lambda v: v != 0].reset_index()

重命名最后一列,因为它将为零。

>>> df.rename(columns={0: 'amount'}, inplace=True)
>>> df
   date name  amount
0     1    A      10
1     1    B      15
2     2    A      15
3     2    B      15
4     3    A      15
5     3    B      15
6     3    C       7
7     4    A      23
8     4    B      25
9     4    C      18
相关问题