客户每日付款交易数据按时间顺序排列: 第一列是customerID,第二列是transactionID,它是主键。第三到第五列是TransactionDate,TransactionType和TransactionAmount。 TransactionType可以是具有正TransactionAmount的正常事务(" ACH"),也可以是具有负TransactionAmount的先前事务的拒绝记录(" ACH拒绝")。拒绝通常在3个工作日内发生。我想通过使用CustomID和CustomAmount匹配将拒绝记录映射回其初始事务。
如何在SQL中为大数据集执行此操作?
谢谢!
答案 0 :(得分:0)
假设这是元数据和一些适合您业务案例的示例。
CREATE TABLE customer_payment_transaction (
customerID bigint NOT NULL,
transactionID bigint PRIMARY KEY,
TransactionDate timestamp NOT NULL,
TransactionType varchar(30) NOT NULL,
TransactionAmount decimal(10,2) NOT NULL
);
INSERT INTO customer_payment_transaction
VALUES
(1,1, '2017-12-10 09:34:54', 'ACH' , 24.88),
(1,2, '2017-12-11 09:34:54', 'ACH' , 34.88),
(1,3, '2017-12-12 09:34:54', 'ACH rejections' , -34.88),
(2,4, '2017-12-13 09:34:54', 'ACH' , 54.88),
(2,5, '2017-12-14 09:34:54', 'ACH' , 94.88),
(2,6, '2017-12-15 09:34:54', 'ACH' , 104.88),
(2,7, '2017-12-16 09:34:54', 'ACH rejections' , -94.88),
(2,8, '2017-12-17 09:34:54', 'ACH rejections' , -104.88),
(1,9, '2017-12-17 10:34:54', 'ACH' , 24.88),
(1,10,'2017-12-18 09:34:54', 'ACH rejections' , -24.88);
然后解决方案如下
SELECT
inbound.customerID,
inbound.TransactionAmount AS transaction_amount,
MIN(inbound.TransactionDate) AS purchase_date,
MIN(outbound.TransactionDate) AS reject_date
FROM customer_payment_transaction inbound
INNER JOIN customer_payment_transaction outbound
ON outbound.customerID = inbound.customerID
AND ABS(outbound.TransactionAmount) = inbound.TransactionAmount
AND outbound.TransactionDate < DATE_ADD(inbound.TransactionDate, INTERVAL 72 HOUR)
AND outbound.TransactionType = 'ACH rejections'
WHERE inbound.TransactionType = 'ACH'
GROUP BY inbound.customerID, inbound.TransactionAmount
上试用
如果在72小时内有两个出站 OR 两个入站且具有相同的CustomerID, TransactionAmount
,则这将不准确。在这些用例中,您将需要其他标识符列,您需要查找或创建这些列。