根据交易日期

时间:2016-07-20 15:40:54

标签: sql sql-server tsql

这里的基本T-SQL用户。我在尝试完成任务时遇到问题,并希望得到一些指导。对于任何错误提前道歉,因为英语不是我的母语。

我有一张包含大量交易的表格,为了简单起见,我们说我只有两列:CUSTOMER_ID,这是我的客户,DATE是交易日期。

我的客户在城里进行了大量的交易,但之后他们可以花上几周,几个月甚至几年才能回来再开始进行交易。我想以某种方式识别每一个" Trips"并对所涉及的交易进行分组,然后我喜欢计算行程持续时间,交易次数等等。

我想将旅行视为在闲暇时间10天后发生的任何新交易。

让我尝试使用一些简单的例子来更好地解释我的请求:

这是我的交易表:

+-------------+------------+
| CUSTOMER_ID |    DATE    |
+-------------+------------+
| JHON        | 01-01-2016 |
| JHON        | 01-02-2016 |
| PEDRO       | 01-02-2016 |
| JHON        | 01-05-2016 |
| MIKE        | 01-05-2016 |
| MIKE        | 01-10-2016 |
| JHON        | 01-07-2016 |
| …           | …          |
| JHON        | 02-15-2016 |
| JHON        | 02-18-2016 |
| MIKE        | 02-19-2016 |
| MIKE        | 02-19-2016 |
+-------------+------------+

到目前为止,我已经提出此查询以列举客户的访问次数:

SELECT
    CUSTOMER_ID,
    DATE,
    ROW_NUMBER() OVER(PARTITION BY CUSTOMER_ID ORDER BY DATE) as VISIT_NUM

FROM
    TRANSACTIONS
WHERE
    CUSTOMER_ID IN ('JHON','MIKE','PEDRO')

运行该查询会得到与此类似的结果:

+-------------+------------+-----------+
| CUSTOMER_ID |    DATE    | VISIT_NUM |
+-------------+------------+-----------+
| JHON        | 01-01-2016 |         1 |
| JHON        | 01-02-2016 |         2 |
| JHON        | 01-07-2016 |         3 |
| JHON        | 02-15-2016 |         4 |
| JHON        | 02-18-2016 |         5 |
| MIKE        | 01-05-2016 |         1 |
| MIKE        | 01-10-2016 |         2 |
| MIKE        | 02-19-2016 |         3 |
| MIKE        | 02-19-2016 |         4 |
| PEDRO       | 01-02-2016 |         1 |
+-------------+------------+-----------+

现在来了一个棘手的部分:我需要以某种方式创建一个查询(可能使用上面的查询作为上一步)向我展示他们的旅行信息,继续这个例子,理想的结果是这样的:< / p>

+-------------+----------+---------------+-------------+---------------+--------------+
| CUSTOMER_ID | TRIP_NUM | TRIP_START_DT | TRIP_END_DT | TRIP_DURATION | TRANSACTIONS |
+-------------+----------+---------------+-------------+---------------+--------------+
| JHON        |        1 | 01-01-2016    | 01-07-2016  |             7 |            3 |
| JHON        |        2 | 02-15-2016    | 02-18-2016  |             3 |            2 |
| MIKE        |        1 | 01-05-2016    | 01-10-2016  |             5 |            2 |
| MIKE        |        2 | 02-19-2016    | 02-19-2016  |             1 |            2 |
| PEDRO       |        1 | 01-02-2016    | 01-02-2016  |             1 |            1 |
+-------------+----------+---------------+-------------+---------------+--------------+

正如你所看到的,Jhon先生在1月份来过3次,并在2月再次回来。从他1月份的最后一笔交易开始超过10天后,我想将他的新交易视为新的交易&#34;为了他。迈克也在1月份开展了一些活动,并在2月回来,在他的第二次旅行中,他在同一天做了两笔交易,我也想说明这一点。如果客户只出现一天并且有一些活动(就像Pedro先生的情况那样),我也想将这一天的单笔交易记录视为旅行记录。

我会非常感谢任何关于此的亮点,我真的不知道如何继续(我已经阅读了关于游标的信息,但此时它看起来像是黑魔法,无法找到实现它们的方法这一点)。

再次为任何语法错误和我可能遗漏任何遗漏道歉。如有必要,我会进一步澄清任何事情。

2 个答案:

答案 0 :(得分:2)

在您的示例中,计算行程持续时间并非所有员工的标准,因此我已将其调整为遵循所有

的第一个客户ID

DEMO HERE

 ;with cte
 as
 (select cid,datee,datepart(month,datee) as monthh,
  dense_rank () over (partition by cid order by datepart(month,datee)) as samemonth,
 count(0) over (partition by cid,datepart(month,datee) ) as cnt
 from #temp
)
,cte1 as
 (
select cid,max(samemonth) as tripnumber,min(datee) as startdate,max(datee) as enddate,
max(cnt) as numberoftrips
from  cte 
group by cid,samemonth
)
select *,datediff(day,startdate,dateadd(day,1,enddate))as duration
from  cte1 

输出

cid   tripnumber startdate      enddate    numberoftransactions duration
JHON    1        2016-01-01    2016-01-07   3                    7
JHON    2        2016-02-15    2016-02-18   2                    4
MIKE    1        2016-01-05    2016-01-10   2                    6
MIKE    2        2016-02-19    2016-02-19   2                    1
PEDRO   1        2016-01-02    2016-01-02   1                    1

答案 1 :(得分:1)

我在其他地方找到了完美的答案。所有功劳都归功于Reddit用户nvarscar以获得惊人的解决方案!

我将在下面复制他/她的答案,以防其他人在将来需要它:

  

您可以使用窗口功能功能,它可以帮助您进行聚合   当前行和所有前面的行之间的行。代码也是如此   很长,但至少你会看到采取的步骤。

DECLARE @t TABLE 
    ([CUSTOMER_ID] varchar(5), [DATE] datetime)
;

INSERT INTO @t
    ([CUSTOMER_ID], [DATE])
VALUES
    ('JHON', '2016-01-01 00:00:00'),
    ('JHON', '2016-01-02 00:00:00'),
    ('PEDRO', '2016-01-02 00:00:00'),
    ('JHON', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-05 00:00:00'),
    ('MIKE', '2016-01-10 00:00:00'),
    ('JHON', '2016-01-07 00:00:00'),
    ('JHON', '2016-02-15 00:00:00'),
    ('JHON', '2016-02-18 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('MIKE', '2016-02-19 00:00:00'),
    ('JHON', '2016-02-01 00:00:00'),
    ('JHON', '2016-02-02 00:00:00'),
    ('PEDRO', '2016-03-02 00:00:00'),
    ('JHON', '2016-03-05 00:00:00'),
    ('MIKE', '2016-05-05 00:00:00'),
    ('MIKE', '2016-05-10 00:00:00'),
    ('JHON', '2016-03-07 00:00:00'),
    ('JHON', '2016-04-15 00:00:00'),
    ('JHON', '2016-04-18 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00'),
    ('MIKE', '2016-06-19 00:00:00')
;


WITH CTE1 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, COUNT(*) AS Transactions
FROM @t
GROUP BY 
  [CUSTOMER_ID]
, [DATE]
)
, CTE2 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, DATEDIFF(day,LAG([DATE]) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE]),[DATE]) AS DaysSinceLastTransaction
FROM CTE1
)
, CTE3 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, CASE WHEN DaysSinceLastTransaction > 10 THEN 1 ELSE 0 END AS TripTag --Here we set the idle tag
FROM CTE2
)
, CTE4 AS (
SELECT 
  [CUSTOMER_ID]
, [DATE]
, Transactions
, SUM(TripTag) OVER (PARTITION BY [CUSTOMER_ID] ORDER BY [DATE] ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TripTag
FROM CTE3
)
SELECT 
  [CUSTOMER_ID]
, TripTag+1 AS TripNumber
, MIN ([DATE]) AS TripStartDate
, MAX ([DATE]) AS TripEndDate
, DATEDIFF(day, MIN ([DATE]), MAX ([DATE])) AS TripDuration
, SUM(Transactions) AS Transactions
FROM CTE4
GROUP BY [CUSTOMER_ID], TripTag