我有一张表格,显示每家公司与他们相关的联系人以及关系持续的日期。我想要做的是为每家公司制定一系列关系存在的范围以及在此期间关联的联系人数量。
(我在搜索栏中找不到类似的问题,如果这是一个骗局,请道歉)
对于以下样本数据:
CREATE TABLE #overlap
(
contact_id NVARCHAR(255)
,company_id NVARCHAR(255)
,relationship_started DATETIME
,relationship_ended DATETIME
)
INSERT INTO #overlap VALUES ('19CB6330-8559-4357-BF70-8F2EBAE4CF90','AECA7DFD-4551-455A-B01D-0000AE19B712','2006-07-25 00:00:00.000','2010-04-11 10:52:00.000')
INSERT INTO #overlap VALUES ('90331A59-EED3-47D5-8885-4648825FE06F','AECA7DFD-4551-455A-B01D-0000AE19B712','2007-07-31 00:00:00.000','2007-08-24 01:09:00.000')
INSERT INTO #overlap VALUES ('CFF414A7-4AB7-4C38-9915-6A107C9044AE','AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-11 00:00:00.000','2008-02-05 08:23:00.000')
INSERT INTO #overlap VALUES ('EC520389-4B84-429B-97D2-9653CFC47669','AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-01 00:00:00.000','2011-02-08 09:00:00.000')
我想要以下结果:
CREATE TABLE #results
(
company_id NVARCHAR(255)
,start_date DATETIME
,end_date DATETIME
,relationship_number INT
,num_contacts INT
)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2006-07-25 00:00:00.000','2007-07-31 00:00:00.000',1,1)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2007-07-31 00:00:00.000','2007-08-24 01:09:00.000',2,2)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2007-08-24 01:09:00.000','2008-01-01 00:00:00.000',3,1)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-01 00:00:00.000','2008-01-11 00:00:00.000',4,2)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-11 00:00:00.000','2008-02-05 08:23:00.000',5,3)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2008-02-05 08:23:00.000','2010-04-11 10:52:00.000',6,2)
INSERT INTO #results VALUES('AECA7DFD-4551-455A-B01D-0000AE19B712','2010-04-11 10:52:00.000','2011-02-08 09:00:00.000',7,1)
我开始使用它来分组开始日期(然后重复结束日期),但无法确定如何使用组来区分连续范围
DENSE_RANK() OVER
(PARTITION BY company_id
ORDER BY DATEPART(YEAR, relationship_started)*12 + DATEPART(MONTH, relationship_started)) AS start_group
任何指向正确的方向感激不尽! :)
编辑包含更多样本数据:
CREATE TABLE #overlap
(
contact_id NVARCHAR(255)
,company_id NVARCHAR(255)
,relationship_started DATETIME
,relationship_ended DATETIME
)
INSERT INTO #overlap VALUES ('19CB6330-8559-4357-BF70-8F2EBAE4CF90','AECA7DFD-4551-455A-B01D-0000AE19B712','2006-07-25 00:00:00.000','2010-04-11 10:52:00.000')
INSERT INTO #overlap VALUES ('90331A59-EED3-47D5-8885-4648825FE06F','AECA7DFD-4551-455A-B01D-0000AE19B712','2007-07-31 00:00:00.000','2007-08-24 01:09:00.000')
INSERT INTO #overlap VALUES ('CFF414A7-4AB7-4C38-9915-6A107C9044AE','AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-11 00:00:00.000','2008-02-05 08:23:00.000')
INSERT INTO #overlap VALUES ('EC520389-4B84-429B-97D2-9653CFC47669','AECA7DFD-4551-455A-B01D-0000AE19B712','2008-01-01 00:00:00.000','2011-02-08 09:00:00.000')
INSERT INTO #overlap VALUES ('E892EF8B-962F-4974-A3ED-60F6E916DB86','0B10F07D-3662-454B-8FAC-87E5AE92EE8D','2008-07-01 00:00:00.000','2011-02-08 09:00:00.000')
INSERT INTO #overlap VALUES ('76F944E1-E9C8-486E-912B-4A322F5F0A05','6FB62827-A27C-4110-BBA4-BC04C84C0219','2010-12-06 00:00:00.000','2011-02-08 13:00:00.000')
INSERT INTO #overlap VALUES ('69E3768C-3118-48E1-B590-8A7D02726227','6FB62827-A27C-4110-BBA4-BC04C84C0219','1950-01-01 00:00:00.000','2010-12-06 00:00:00.000')
答案 0 :(得分:3)
WITH Dates
AS (SELECT company_id, relationship_started As Date
FROM #overlap
UNION
SELECT company_id, relationship_ended
FROM #overlap),
T
AS (SELECT COUNT(*) AS Cnt,
d.Date,
ROW_NUMBER() OVER (PARTITION BY d.company_id ORDER BY (d.Date)) AS RN,
d.company_id
FROM Dates d
left JOIN #overlap o
ON d.Date >= relationship_started
and d.Date < relationship_ended
and d.company_id = o.company_id
group by d.Date,
d.company_id)
SELECT t1.company_id,
t1.Date,
t2.Date,
ROW_NUMBER() over (PARTITION BY t1.company_id order by (select 0)) as relationship_number,
t1.Cnt
FROM T t1
JOIN T t2
ON t2.RN = t1.RN + 1 AND t1.company_id = t2.company_id
ORDER BY t1.company_id, t1.Date
答案 1 :(得分:1)
我还没有完全完成 - 但也许这个输入可以带你进入正确的方向。
我的想法是创建两个CTE(公用表表达式),它将列出新关系开始时的日期(值为+1),以及关系结束时的值(值为-1)
基本上,你会得到一份历史清单,列出与该公司的关系何时发生变化。
我现在已经走到这一步了:
;WITH DatesStarted AS
(SELECT relationship_started AS 'TheDate', 1 AS 'Delta'
FROM #overlap
),
DatesEnded AS
(SELECT relationship_ended AS 'TheDate', -1 AS 'Delta'
FROM #overlap
)
SELECT *
FROM DatesStarted
UNION
SELECT *
FROM DatesEnded
ORDER BY thedate
我得到这样的输出:
TheDate Delta
2006-07-25 00:00:00.000 1
2007-07-31 00:00:00.000 1
2007-08-24 01:09:00.000 -1
2008-01-01 00:00:00.000 1
2008-01-11 00:00:00.000 1
2008-02-05 08:23:00.000 -1
2010-04-11 10:52:00.000 -1
2011-02-08 09:00:00.000 -1
所以考虑到这一点,你现在应该在结果表中生成插入....还没有完成(我必须跑到会议上) - 但也许这有助于你开始!
答案 2 :(得分:1)
这是我的解决方案,用于记录。
不知怎的,我已经成功地对它进行了重组,所以对我来说它似乎不再那么重要。尽管如此,执行计划表明它的性能低于公认的解决方案。
;WITH
dates AS (
SELECT
company_id,
date = relationship_started,
count_change = +1
FROM #overlap
UNION ALL
SELECT
company_id,
date = relationship_ended,
count_change = -1
FROM #overlap
),
sorted_dates AS (
SELECT
company_id,
date,
count_change,
rownum = ROW_NUMBER() OVER (PARTITION BY company_id ORDER BY date)
FROM (
SELECT
company_id,
date,
count_change = SUM(count_change)
FROM dates
GROUP BY company_id, date
HAVING SUM(count_change) <> 0
) s
)
SELECT
sd1.company_id,
start_date = MAX(sd2.date),
end_date = sd1.date,
relationship_number = MAX(sd2.rownum),
num_contacts = SUM(sd2.count_change)
FROM sorted_dates sd1
INNER JOIN sorted_dates sd2 ON sd1.company_id = sd2.company_id
AND sd2.rownum BETWEEN 1 AND sd1.rownum - 1
GROUP BY sd1.company_id, sd1.date
答案 3 :(得分:0)
我相信这个答案适用于多家公司。评论嵌入到代码中:
WITH company_dates (company_id, date) AS (
/* get all relevant dates */
SELECT company_id, relationship_started AS date
FROM #overlap
UNION
SELECT company_id, relationship_ended
FROM #overlap
)
, sequenced_dates (company_id, date, sequence) AS (
/* assign dates a sequence for easy joining */
SELECT company_id, date, row_number() OVER (PARTITION BY company_id ORDER BY DATE) AS sequence
FROM company_dates
)
SELECT d1.company_id, d1.date AS start_date, d2.date AS end_date, row_number() OVER (PARTITION BY d1.company_id ORDER BY d1.date) AS relationship_number, count(*) AS numContacts
FROM sequenced_dates d1
JOIN sequenced_dates d2 /* join to get start and end date for each region */
ON d1.company_id = d2.company_id
AND d1.sequence = d2.sequence-1
JOIN #overlap o /* join to get all involved relationships */
ON o.company_id = d1.company_id
AND o.relationship_started <= d1.date
AND o.relationship_ended >= d2.date
GROUP BY d1.company_id, d1.date, d2.date
ORDER BY d1.company_Id, d1.date