我正在使用Azure Machine Learning Studio以及在我的数据集上添加运行总计的内容。这包括一个日期列,我想在行日期之前或之前对所有行(对于一个组)求和。
在SQL Server中,我会使用:
SELECT [t1].*,
SUM([t1].[Amount (Settlement CCY))
OVER (
PARTITION BY [t1].[Contract Ref], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
ORDER BY [t1].[Transaction Date] ASC
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
FROM [t1]
GROUP BY [t1].[contract ref], [t1].[Transaction date], [t1].[LOBCode], [t1].[Superline], [t1].[Occupation], [t1].[TransType], [t1].[SettCCY]
但Azure Machine学习使用SQLite,其中未实现Over / Partition子句。
我在python / pandas中尝试了另一种选择:
dataframe1 = dataframe1.assign(cumAMTscTD=dataframe1.groupby(['ContractRef', 'Basis', 'LOBCode', 'Superline', 'Occupation', 'TransType', 'SettCCY'])['AmtSettCCY'].transform('sum')).sort_values(['ContractRef','TransDate'])
但是这总结了该组的所有内容,而不仅仅是截至当前行的日期。因此,我认为它不包括:
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
我将如何实现这一目标?
答案 0 :(得分:0)
在SQLite中,您可以将逻辑实现为:
with t as (
select t1.contract_ref, t1.transaction_date, sum(t1.amount) as amount
from t1
group by t1.contract_ref, t1.transaction_date
)
select t.*,
(select sum(t2.amount)
from t t2
where t2.contract_ref = t.contract_ref and
t2.transaction_date <= t.transaction_date
) as running_amount
from t;