这里的基本T-SQL用户。我在尝试完成任务时遇到问题,并希望得到一些指导。对于任何错误提前道歉,因为英语不是我的母语。
我有一张包含大量交易的表格,为了简单起见,我们说我只有两列:CUSTOMER_ID,这是我的客户,DATE是交易日期。
我的客户在城里进行了大量的交易,但之后他们可以花上几周,几个月甚至几年才能回来再开始进行交易。我想以某种方式识别每一个" Trips"并对所涉及的交易进行分组,然后我喜欢计算行程持续时间,交易次数等等。
我想将旅行视为在闲暇时间10天后发生的任何新交易。
让我尝试使用一些简单的例子来更好地解释我的请求:
这是我的交易表:
+-------------+------------+
| CUSTOMER_ID | DATE |
+-------------+------------+
| JHON | 01-01-2016 |
| JHON | 01-02-2016 |
| PEDRO | 01-02-2016 |
| JHON | 01-05-2016 |
| MIKE | 01-05-2016 |
| MIKE | 01-10-2016 |
| JHON | 01-07-2016 |
| … | … |
| JHON | 02-15-2016 |
| JHON | 02-18-2016 |
| MIKE | 02-19-2016 |
| MIKE | 02-19-2016 |
+-------------+------------+
到目前为止,我已经提出此查询以列举客户的访问次数:
SELECT
CUSTOMER_ID,
DATE,
ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM
FROM
TRANSACTIONS
WHERE
CUSTOMER_ID IN ('JHON','MIKE','PEDRO')
运行该查询会得到与此类似的结果:
+-------------+------------+-----------+
| CUSTOMER_ID | DATE | VISIT_NUM |
+-------------+------------+-----------+
| JHON | 01-01-2016 | 1 |
| JHON | 01-02-2016 | 2 |
| JHON | 01-07-2016 | 3 |
| JHON | 02-15-2016 | 4 |
| JHON | 02-18-2016 | 5 |
| MIKE | 01-05-2016 | 1 |
| MIKE | 01-10-2016 | 2 |
| MIKE | 02-19-2016 | 3 |
| MIKE | 02-19-2016 | 4 |
| PEDRO | 01-02-2016 | 1 |
+-------------+------------+-----------+
现在来了一个棘手的部分:我需要以某种方式创建一个查询(可能使用上面的查询作为上一步)向我展示他们的旅行信息,继续这个例子,理想的结果是这样的:< / p>
+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON | 1 | 01-01-2016 | 01-07-2016 | 7 | 3 |
| JHON | 2 | 02-15-2016 | 02-18-2016 | 3 | 2 |
| MIKE | 1 | 01-05-2016 | 01-10-2016 | 5 | 2 |
| MIKE | 2 | 02-19-2016 | 02-19-2016 | 1 | 2 |
| PEDRO | 1 | 01-02-2016 | 01-02-2016 | 1 | 1 |
+-------------+----------+---------------+-------------+---------------+--------------+
正如你所看到的,Jhon先生在1月份来过3次,并在2月再次回来。从他1月份的最后一笔交易开始超过10天后,我想将他的新交易视为新的交易&#34;为了他。迈克也在1月份开展了一些活动,并在2月回来,在他的第二次旅行中,他在同一天做了两笔交易,我也想说明这一点。如果客户只出现一天并且有一些活动(就像Pedro先生的情况那样),我也想将这一天的单笔交易记录视为旅行记录。
我会非常感谢任何关于此的亮点,我真的不知道如何继续(我已经阅读了关于游标的信息,但此时它看起来像是黑魔法,无法找到实现它们的方法这一点)。
再次为任何语法错误和我可能遗漏任何遗漏道歉。如有必要,我会进一步澄清任何事情。
答案 0 :(得分:2)
在您的示例中,计算行程持续时间并非所有员工的标准,因此我已将其调整为遵循所有
的第一个客户ID ;with cte
as
(select cid,datee,datepart(month,datee) as monthh,
dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
count(0) over (partition by cid,datepart(month,datee) ) as cnt
from #temp
)
,cte1 as
(
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from cte
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from cte1
输出
cid tripnumber startdate enddate numberoftransactions duration
JHON 1 2016-01-01 2016-01-07 3 7
JHON 2 2016-02-15 2016-02-18 2 4
MIKE 1 2016-01-05 2016-01-10 2 6
MIKE 2 2016-02-19 2016-02-19 2 1
PEDRO 1 2016-01-02 2016-01-02 1 1
答案 1 :(得分:1)
我在其他地方找到了完美的答案。所有功劳都归功于Reddit用户nvarscar以获得惊人的解决方案!
我将在下面复制他/她的答案,以防其他人在将来需要它:
您可以使用窗口功能功能,它可以帮助您进行聚合 当前行和所有前面的行之间的行。代码也是如此 很长,但至少你会看到采取的步骤。
DECLARE @t TABLE
([CUSTOMER_ID] varchar(5), [DATE] datetime)
;
INSERT INTO @t
([CUSTOMER_ID], [DATE])
VALUES
('JHON', '2016-01-01 00:00:00'),
('JHON', '2016-01-02 00:00:00'),
('PEDRO', '2016-01-02 00:00:00'),
('JHON', '2016-01-05 00:00:00'),
('MIKE', '2016-01-05 00:00:00'),
('MIKE', '2016-01-10 00:00:00'),
('JHON', '2016-01-07 00:00:00'),
('JHON', '2016-02-15 00:00:00'),
('JHON', '2016-02-18 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('MIKE', '2016-02-19 00:00:00'),
('JHON', '2016-02-01 00:00:00'),
('JHON', '2016-02-02 00:00:00'),
('PEDRO', '2016-03-02 00:00:00'),
('JHON', '2016-03-05 00:00:00'),
('MIKE', '2016-05-05 00:00:00'),
('MIKE', '2016-05-10 00:00:00'),
('JHON', '2016-03-07 00:00:00'),
('JHON', '2016-04-15 00:00:00'),
('JHON', '2016-04-18 00:00:00'),
('MIKE', '2016-06-19 00:00:00'),
('MIKE', '2016-06-19 00:00:00')
;
WITH CTE1 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM @t
GROUP BY
[CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT
[CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT
[CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag