在pandas中等效的行数(python

时间:2018-06-07 11:04:06

标签: python sql pandas sqlite azure-machine-learning-studio

我正在使用Azure Machine Learning Studio以及在我的数据集上添加运行总计的内容。这包括一个日期列,我想在行日期之前或之前对所有行(对于一个组)求和。

在SQL Server中,我会使用:

    SELECT [t1].*,
SUM([t1].[Amount (Settlement CCY)) 
OVER (
  PARTITION BY [t1].[Contract Ref], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
  ORDER BY     [t1].[Transaction Date] ASC
  ROWS BETWEEN UNBOUNDED PRECEDING
       AND     CURRENT ROW
)
FROM [t1]
GROUP BY [t1].[contract ref], [t1].[Transaction date], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]

但Azure Machine学习使用SQLite,其中未实现Over / Partition子句。

我在python / pandas中尝试了另一种选择:

dataframe1 = dataframe1.assign(cumAMTscTD=dataframe1.groupby(['ContractRef', 'Basis', 'LOBCode', 'Superline', 'Occupation', 'TransType', 'SettCCY'])['AmtSettCCY'].transform('sum')).sort_values(['ContractRef','TransDate'])

但是这总结了该组的所有内容,而不仅仅是截至当前行的日期。因此,我认为它不包括:

ROWS BETWEEN UNBOUNDED PRECEDING
   AND     CURRENT ROW

我将如何实现这一目标?

1 个答案:

答案 0 :(得分:0)

在SQLite中,您可以将逻辑实现为:

with t as (
      select t1.contract_ref, t1.transaction_date, sum(t1.amount) as amount
      from t1
      group by t1.contract_ref, t1.transaction_date
     )
select t.*,
       (select sum(t2.amount)
        from t t2
        where t2.contract_ref = t.contract_ref and
              t2.transaction_date <= t.transaction_date
       ) as running_amount
from t;