任何关于优化以下查询的建议都会计算常见和所有邻居?

时间:2010-11-22 09:14:00

标签: sql sql-server-2008

该表由列calling_party和called_pa​​rty组成,记录描述了两个用户之间的连接,其中一个用户扮演主叫方角色,另一个用户称为被叫方。

同样的两个用户可以有两个连接 - 在这种情况下,当方向改变时,角色呼叫/被叫方将被切换。

在原始表(monthly_connections)中,我添加了其他列common_neighbors和total_neighbors,其中存储了公共和总邻居的数量。为了澄清common和total_neighbors一词,我添加了以下图片:

alt text

在这种情况下,对于观察到的连接,有2个呼叫和被叫方的共同邻居以及6个总邻居。

为了获得这两个值,我编写了以下存储过程:

CREATE PROCEDURE [dbo].[spCountNeighbors]  

AS

Declare 
@CallingParty varchar(50),
@CalledParty varchar(50),
@RecordsUpdated int

SET @CallingParty ='a'
SET @RecordsUpdated = 0
PRINT GETDATE()
WHILE @CallingParty IS NOT NULL BEGIN
    SET @CallingParty = NULL
    SELECT TOP 1 @CallingParty = calling_party, @CalledParty = called_party FROM    monthly_connections WHERE common_neighbors IS NULL
    --PRINT @CallingParty
    IF @CallingParty IS NOT NULL BEGIN
    WITH callingPartyNeighbors AS
    (
        SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CallingParty
        UNION
        SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CallingParty
    ),
    calledPartyNeighbors AS
    (
        SELECT calling_party as neighbor FROM monthly_connections WHERE called_party = @CalledParty
        UNION
        SELECT called_party as neighbor FROM monthly_connections WHERE calling_party = @CalledParty
    )

        UPDATE mc SET common_neighbors = (SELECT COUNT (*) FROM
        (
        SELECT neighbor FROM callingPartyNeighbors
        INTERSECT
        SELECT neighbor FROM calledPartyNeighbors
        )
        t1
        ),
        total_neighbors = (SELECT COUNT (*) FROM
        (
        SELECT neighbor FROM callingPartyNeighbors
        UNION
        SELECT neighbor FROM calledPartyNeighbors
        )
        t2
        )
         FROM monthly_connections mc WHERE (mc.calling_party = @CallingParty AND mc.called_party = @CalledParty) OR (mc.called_party = @CallingParty AND mc.calling_party = @CalledParty);
        SET @RecordsUpdated = @RecordsUpdated + @@ROWCOUNT
        PRINT @RecordsUpdated
    END 
END
PRINT @RecordsUpdated

上述过程应该通过包含23M连接的连接表,并为每一行更新值common_neighbors和total_neighbors。然而问题是程序太慢 - 更新1000条记录需要212秒。

如果你们中的任何人建议修改上述程序以加快执行时间,我将非常感激。

谢谢!

2 个答案:

答案 0 :(得分:0)

在你的程序中,你正在做很多子查询,我认为这是你失去性能的主要原因。 你不能只用一个大连接替换多个查询然后过滤它吗? 像

这样的东西
SELECT T.calling_party, T.called_party, A.called_party, B.called_party
from table T
join table as A
on T.calling_party = A.calling_party
join table as B
on T.calling_party = B.calling_party
where A.called_party = B.called_party --to get the commong neighbour 

你可能需要在called_pa​​rty上另一个连接来获取完整列表,但是我认为这可能比通过23M记录迭代并为所有这些记录调用多个查询更快。

答案 1 :(得分:0)

以下脚本为您的存储过程生成common_neighbors的相同输出。

虽然不知怎的,我觉得它还不完全(还)你需要什么,但你可能会选择一些新的想法。

DECLARE @monthly_connections TABLE (
  calling_party VARCHAR(50)
  , called_party VARCHAR(50)
  , common_neighbors INTEGER
  , total_neighbors INTEGER)

INSERT INTO @monthly_connections
          SELECT '1', '3', NULL, NULL
UNION ALL SELECT '2', '4', NULL, NULL
UNION ALL SELECT '3', '2', NULL, NULL
UNION ALL SELECT '3', '4', NULL, NULL
UNION ALL SELECT '3', '6', NULL, NULL
UNION ALL SELECT '3', '7', NULL, NULL
UNION ALL SELECT '4', '5', NULL, NULL
UNION ALL SELECT '8', '4', NULL, NULL

;WITH q AS (
  SELECT  calling_party, called_party
  FROM    @monthly_connections mc1
  UNION ALL
  SELECT  called_party, calling_party
  FROM    @monthly_connections mc1
)
UPDATE  @monthly_connections
SET     common_neighbors = common_neighbors.cnt
FROM    @monthly_connections mc
        INNER JOIN (
          SELECT  q1.calling_party, q1.called_party, cnt = COUNT(*) 
          FROM    q q1
                  INNER JOIN q q2 ON q2.calling_party = q1.called_party                          
                  INNER JOIN q q3 ON q3.calling_party = q2.called_party
                                     AND q3.called_party = q1.calling_party
          GROUP BY
                  q1.calling_party, q1.called_party
        ) common_neighbors ON common_neighbors.calling_party = mc.calling_party
                              AND common_neighbors.called_party = mc.called_party


SELECT *
FROM  @monthly_connections