带有OR条件的GROUP BY之类的SQL查询

时间:2011-06-08 14:43:17

标签: sql-server sql-server-2005 tsql recursion

我将尝试描述真实情况。在我们公司,我们有一个带桌子的预订系统,我们称之为 Customers ,其中电子邮件和电话联系人与每个收到的订单一起保存 - 这是我无法改变的系统的一部分。我正面临着如何计算独特客户的问题。对于唯一客户,我指的是拥有相同电子邮件或相同电话号码的一群人。

示例1 :从现实生活中你可以想象已经结婚的汤姆和桑德拉。订购4种产品的汤姆在我们的预订系统中填写了3个不同的电子邮件地址和2个不同的电话号码,当其中一个与Sandra(作为家庭电话)共享时,我可以假设它们以某种方式连接。桑德拉除了这个共享的电话号码也填写了她的私人电话号码,对于这两个订单,她只使用了一个电子邮件地址。对我来说,这意味着将以下行的所有计为一个唯一客户。事实上,这个独特的客户可能会成长为整个家庭。

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    tom@email.com       +44 111 111    First row
1    tommy@email.com     +44 111 111    Same phone, different e-mail
2    thomas@email.com    +44 111 111    Same phone, different e-mail
3    thomas@email.com    +44 222 222    Same e-mail, different phone
4    sandra@email.com    +44 222 222    Same phone, different e-mail
5    sandra@email.com    +44 333 333    Same e-mail, different phone

正如ypercube所说,我可能需要递归来计算所有这些独特客户

示例2 :以下是我想要做的示例。

是否可以在不使用的情况下获得独立客户数例如通过使用游标或其他东西递归或者是必要的递归吗?

ID   E-mail              Phone          Comment
---- ------------------- -------------- ------------------------------
0    linsey@email.com    +44 111 111    ─┐
1    louise@email.com    +44 111 111     ├─ 1. unique customer
2    louise@email.com    +44 222 222    ─┘
---- ------------------- -------------- ------------------------------
3    steven@email.com    +44 333 333    ─┐
4    steven@email.com    +44 444 444     ├─ 2. unique customer
5    sandra@email.com    +44 444 444    ─┘
---- ------------------- -------------- ------------------------------
6    george@email.com    +44 555 555    ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7    xavier@email.com    +44 666 666    ─┐
8    xavier@email.com    +44 777 777     ├─ 4. unique customer
9    xavier@email.com    +44 888 888    ─┘
---- ------------------- -------------- ------------------------------
10   robert@email.com    +44 999 999    ─┐
11   miriam@email.com    +44 999 999     ├─ 5. unique customer
12   sherry@email.com    +44 999 999    ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result                                  ∑ = 5 unique customers
----------------------------------------------------------------------

我已尝试使用GROUP BY进行查询,但我不知道如何按第一列或第二列对结果进行分组。我正在寻找让我们说类似

的东西
SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone

再次感谢您的任何建议

P.S。 在完整的改写之前,我真的很感激这个问题的答案。现在这里的答案可能与更新不符,所以如果您打算这样做,请不要在这里进行投票(当然问题除外)。我完全重写了这篇文章。

谢谢,抱歉我的错误开始。

5 个答案:

答案 0 :(得分:1)

查找只有同一部电话的群组:

SELECT
    ID
  , Name
  , Phone
  , DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM 
    MyTable
ORDER BY
    GroupPhone
  , ID

查找只有相同名称的组:

SELECT
    ID
  , Name
  , Phone
  , DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM 
    MyTable
ORDER BY
    GroupName
  , ID

现在,对于您描述的(复杂)查询,假设我们有一个这样的表:

ID   Name          Phone
---- ------------- -------------
0    Kate          +44 333 333
1    Sandra        +44 000 000
2    Thomas        +44 222 222
3    Robert        +44 000 000
4    Thomas        +44 444 444
5    George        +44 222 222
6    Kate          +44 000 000
7    Robert        +44 444 444
--------------------------------

是否所有这些都属于一个群体?因为他们都与其他人共享姓名或电话,形成了相关人员的“链条”:

0-6   same name
6-1-3 same phone
3-7   same name
7-4   same-phone
4-2   same name
2-5   bame phone

答案 1 :(得分:1)

以下是使用递归CTE的完整解决方案。

;WITH Nodes AS
(
    SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
        , [ID]
    FROM
    (
        SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
        FROM dbo.Customer
        UNION ALL
        SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
        FROM dbo.Customer
    ) A
),
Links AS
(
    SELECT DISTINCT A.Id, B.Id LinkedId
    FROM Nodes A
    JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
    SELECT DISTINCT Id, Id LinkedId
    FROM dbo.Customer

    UNION ALL

    SELECT DISTINCT Id, LinkedId
    FROM Links

    UNION ALL

    SELECT A.Id, B.LinkedId
    FROM Links A
    JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
    SELECT Id, Id LinkedId
    FROM Links

    UNION

    SELECT LinkedId Id, LinkedId
    FROM Links

    UNION

    SELECT Id, LinkedId
    FROM Routes
),
UniqueCustomers AS
(
    SELECT Id, MIN(LinkedId) UniqueCustomerId
    FROM TransitiveClosure
    GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id

答案 2 :(得分:0)

我不知道这是否是最佳解决方案,但现在是:

SELECT
  MyTable.ID, MyTable.Name, MyTable.Phone,
  CASE WHEN N.No = 1 AND P.No = 1 THEN 1
       WHEN N.No = 1 AND P.No > 1 THEN 2
       WHEN N.No > 1 OR P.No > 1  THEN 3
  END as GroupRes
FROM
  MyTable 
  JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
  JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone

问题是这里有一些关于varchars的连接,最终可能会增加执行时间。

答案 3 :(得分:0)

对于示例中的数据集,您可以编写如下内容:

;WITH Temp AS (
    SELECT Name, Phone,
        DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
        DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
    FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup

答案 4 :(得分:0)

这是我的解决方案:

SELECT p.LastName, P.FirstName, P.HomePhone,
CASE 
    WHEN ph.PhoneCount=1 THEN       
        CASE 
            WHEN n.NameCount=1 THEN 'unique name and phone'
            ELSE 'common name'
        END

    ELSE        
        CASE 
            WHEN n.NameCount=1 THEN 'common phone'
            ELSE 'common phone and name'        
        END             
END
FROM Contacts p
INNER JOIN 
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone

INNER JOIN 
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName


LastN       FirstN  Phone       Comment
Hoover      Brenda  8138282334  unique name and phone
Washington  Brian   9044563211  common name
Roosevelt   Brian   7737653279  common name
Reagan      Charles 7734567869  unique name and phone