该表由列calling_party和called_party组成,记录描述了两个用户之间的连接,其中一个用户扮演主叫方角色,另一个用户称为被叫方。
同样的两个用户可以有两个连接 - 在这种情况下,当方向改变时,角色呼叫/被叫方将被切换。
在原始表(monthly_connections)中,我添加了其他列common_neighbors和total_neighbors,其中存储了公共和总邻居的数量。为了澄清common和total_neighbors一词,我添加了以下图片:
在这种情况下,对于观察到的连接,有2个呼叫和被叫方的共同邻居以及6个总邻居。
为了获得这两个值,我编写了以下存储过程:
CREATE PROCEDURE [dbo].[spCountNeighbors]
AS
Declare
@CallingParty varchar(50),
@CalledParty varchar(50),
@RecordsUpdated int
SET @CallingParty ='a'
SET @RecordsUpdated = 0
PRINT GETDATE()
WHILE @CallingParty IS NOT NULL BEGIN
SET @CallingParty = NULL
SELECT TOP 1 @CallingParty = calling_party, @CalledParty = called_party FROM monthly_connections WHERE common_neighbors IS NULL
--PRINT @CallingParty
IF @CallingParty IS NOT NULL BEGIN
WITH callingPartyNeighbors AS
(
SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CallingParty
UNION
SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CallingParty
),
calledPartyNeighbors AS
(
SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CalledParty
UNION
SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CalledParty
)
UPDATE mc SET common_neighbors = (SELECT COUNT (*) FROM
(
SELECT neighbor FROM callingPartyNeighbors
INTERSECT
SELECT neighbor FROM calledPartyNeighbors
)
t1
),
total_neighbors = (SELECT COUNT (*) FROM
(
SELECT neighbor FROM callingPartyNeighbors
UNION
SELECT neighbor FROM calledPartyNeighbors
)
t2
)
FROM monthly_connections mc WHERE (mc.calling_party = @CallingParty AND mc.called_party = @CalledParty) OR (mc.called_party = @CallingParty AND mc.calling_party = @CalledParty);
SET @RecordsUpdated = @RecordsUpdated + @@ROWCOUNT
PRINT @RecordsUpdated
END
END
PRINT @RecordsUpdated
上述过程应该通过包含23M连接的连接表,并为每一行更新值common_neighbors和total_neighbors。然而问题是程序太慢 - 更新1000条记录需要212秒。
如果你们中的任何人建议修改上述程序以加快执行时间,我将非常感激。
谢谢!
答案 0 :(得分:0)
在你的程序中,你正在做很多子查询,我认为这是你失去性能的主要原因。 你不能只用一个大连接替换多个查询然后过滤它吗? 像
这样的东西SELECT T.calling_party, T.called_party, A.called_party, B.called_party
from table T
join table as A
on T.calling_party = A.calling_party
join table as B
on T.calling_party = B.calling_party
where A.called_party = B.called_party --to get the commong neighbour
你可能需要在called_party上另一个连接来获取完整列表,但是我认为这可能比通过23M记录迭代并为所有这些记录调用多个查询更快。
答案 1 :(得分:0)
以下脚本为您的存储过程生成common_neighbors
的相同输出。
虽然不知怎的,我觉得它还不完全(还)你需要什么,但你可能会选择一些新的想法。
DECLARE @monthly_connections TABLE (
calling_party VARCHAR(50)
, called_party VARCHAR(50)
, common_neighbors INTEGER
, total_neighbors INTEGER)
INSERT INTO @monthly_connections
SELECT '1', '3', NULL, NULL
UNION ALL SELECT '2', '4', NULL, NULL
UNION ALL SELECT '3', '2', NULL, NULL
UNION ALL SELECT '3', '4', NULL, NULL
UNION ALL SELECT '3', '6', NULL, NULL
UNION ALL SELECT '3', '7', NULL, NULL
UNION ALL SELECT '4', '5', NULL, NULL
UNION ALL SELECT '8', '4', NULL, NULL
;WITH q AS (
SELECT calling_party, called_party
FROM @monthly_connections mc1
UNION ALL
SELECT called_party, calling_party
FROM @monthly_connections mc1
)
UPDATE @monthly_connections
SET common_neighbors = common_neighbors.cnt
FROM @monthly_connections mc
INNER JOIN (
SELECT q1.calling_party, q1.called_party, cnt = COUNT(*)
FROM q q1
INNER JOIN q q2 ON q2.calling_party = q1.called_party
INNER JOIN q q3 ON q3.calling_party = q2.called_party
AND q3.called_party = q1.calling_party
GROUP BY
q1.calling_party, q1.called_party
) common_neighbors ON common_neighbors.calling_party = mc.calling_party
AND common_neighbors.called_party = mc.called_party
SELECT *
FROM @monthly_connections