我将尝试描述真实情况。在我们公司,我们有一个带桌子的预订系统,我们称之为 Customers ,其中电子邮件和电话联系人与每个收到的订单一起保存 - 这是我无法改变的系统的一部分。我正面临着如何计算独特客户的问题。对于唯一客户,我指的是拥有相同电子邮件或相同电话号码的一群人。
示例1 :从现实生活中你可以想象已经结婚的汤姆和桑德拉。订购4种产品的汤姆在我们的预订系统中填写了3个不同的电子邮件地址和2个不同的电话号码,当其中一个与Sandra(作为家庭电话)共享时,我可以假设它们以某种方式连接。桑德拉除了这个共享的电话号码也填写了她的私人电话号码,对于这两个订单,她只使用了一个电子邮件地址。对我来说,这意味着将以下行的所有计为一个唯一客户。事实上,这个独特的客户可能会成长为整个家庭。
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 tom@email.com +44 111 111 First row
1 tommy@email.com +44 111 111 Same phone, different e-mail
2 thomas@email.com +44 111 111 Same phone, different e-mail
3 thomas@email.com +44 222 222 Same e-mail, different phone
4 sandra@email.com +44 222 222 Same phone, different e-mail
5 sandra@email.com +44 333 333 Same e-mail, different phone
正如ypercube所说,我可能需要递归来计算所有这些独特客户。
示例2 :以下是我想要做的示例。
是否可以在不使用的情况下获得独立客户数例如通过使用游标或其他东西递归或者是必要的递归吗?
ID E-mail Phone Comment
---- ------------------- -------------- ------------------------------
0 linsey@email.com +44 111 111 ─┐
1 louise@email.com +44 111 111 ├─ 1. unique customer
2 louise@email.com +44 222 222 ─┘
---- ------------------- -------------- ------------------------------
3 steven@email.com +44 333 333 ─┐
4 steven@email.com +44 444 444 ├─ 2. unique customer
5 sandra@email.com +44 444 444 ─┘
---- ------------------- -------------- ------------------------------
6 george@email.com +44 555 555 ─── 3. unique customer
---- ------------------- -------------- ------------------------------
7 xavier@email.com +44 666 666 ─┐
8 xavier@email.com +44 777 777 ├─ 4. unique customer
9 xavier@email.com +44 888 888 ─┘
---- ------------------- -------------- ------------------------------
10 robert@email.com +44 999 999 ─┐
11 miriam@email.com +44 999 999 ├─ 5. unique customer
12 sherry@email.com +44 999 999 ─┘
---- ------------------- -------------- ------------------------------
----------------------------------------------------------------------
Result ∑ = 5 unique customers
----------------------------------------------------------------------
我已尝试使用GROUP BY进行查询,但我不知道如何按第一列或第二列对结果进行分组。我正在寻找让我们说类似
的东西SELECT COUNT(*) FROM Customers
GROUP BY Email OR Phone
再次感谢您的任何建议
P.S。
在完整的改写之前,我真的很感激这个问题的答案。现在这里的答案可能与更新不符,所以如果您打算这样做,请不要在这里进行投票(当然问题除外)。我完全重写了这篇文章。
谢谢,抱歉我的错误开始。
答案 0 :(得分:1)
查找只有同一部电话的群组:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Phone) AS GroupPhone
FROM
MyTable
ORDER BY
GroupPhone
, ID
查找只有相同名称的组:
SELECT
ID
, Name
, Phone
, DENSE_RANK() OVER (ORDER BY Name) AS GroupName
FROM
MyTable
ORDER BY
GroupName
, ID
现在,对于您描述的(复杂)查询,假设我们有一个这样的表:
ID Name Phone
---- ------------- -------------
0 Kate +44 333 333
1 Sandra +44 000 000
2 Thomas +44 222 222
3 Robert +44 000 000
4 Thomas +44 444 444
5 George +44 222 222
6 Kate +44 000 000
7 Robert +44 444 444
--------------------------------
是否所有这些都属于一个群体?因为他们都与其他人共享姓名或电话,形成了相关人员的“链条”:
0-6 same name
6-1-3 same phone
3-7 same name
7-4 same-phone
4-2 same name
2-5 bame phone
答案 1 :(得分:1)
以下是使用递归CTE的完整解决方案。
;WITH Nodes AS
(
SELECT DENSE_RANK() OVER (ORDER BY Part, PartRank) SetId
, [ID]
FROM
(
SELECT [ID], 1 Part, DENSE_RANK() OVER (ORDER BY [E-mail]) PartRank
FROM dbo.Customer
UNION ALL
SELECT [ID], 2, DENSE_RANK() OVER (ORDER BY Phone) PartRank
FROM dbo.Customer
) A
),
Links AS
(
SELECT DISTINCT A.Id, B.Id LinkedId
FROM Nodes A
JOIN Nodes B ON B.SetId = A.SetId AND B.Id < A.Id
),
Routes AS
(
SELECT DISTINCT Id, Id LinkedId
FROM dbo.Customer
UNION ALL
SELECT DISTINCT Id, LinkedId
FROM Links
UNION ALL
SELECT A.Id, B.LinkedId
FROM Links A
JOIN Routes B ON B.Id = A.LinkedId AND B.LinkedId < A.Id
),
TransitiveClosure AS
(
SELECT Id, Id LinkedId
FROM Links
UNION
SELECT LinkedId Id, LinkedId
FROM Links
UNION
SELECT Id, LinkedId
FROM Routes
),
UniqueCustomers AS
(
SELECT Id, MIN(LinkedId) UniqueCustomerId
FROM TransitiveClosure
GROUP BY Id
)
SELECT A.Id, A.[E-mail], A.Phone, B.UniqueCustomerId
FROM dbo.Customer A
JOIN UniqueCustomers B ON B.Id = A.Id
答案 2 :(得分:0)
我不知道这是否是最佳解决方案,但现在是:
SELECT
MyTable.ID, MyTable.Name, MyTable.Phone,
CASE WHEN N.No = 1 AND P.No = 1 THEN 1
WHEN N.No = 1 AND P.No > 1 THEN 2
WHEN N.No > 1 OR P.No > 1 THEN 3
END as GroupRes
FROM
MyTable
JOIN (SELECT Name, count(Name) No FROM MyTable GROUP BY Name) N on MyTable.Name = N.Name
JOIN (SELECT Phone, count(Phone) No FROM MyTable GROUP BY Phone) P on MyTable.Phone = P.Phone
问题是这里有一些关于varchars的连接,最终可能会增加执行时间。
答案 3 :(得分:0)
对于示例中的数据集,您可以编写如下内容:
;WITH Temp AS (
SELECT Name, Phone,
DENSE_RANK() OVER (ORDER BY Name) AS NameGroup,
DENSE_RANK() OVER (ORDER BY Phone) AS PhoneGroup
FROM MyTable)
SELECT MAX(Phone), MAX(Name), COUNT(*)
FROM Temp
GROUP BY NameGroup, PhoneGroup
答案 4 :(得分:0)
这是我的解决方案:
SELECT p.LastName, P.FirstName, P.HomePhone,
CASE
WHEN ph.PhoneCount=1 THEN
CASE
WHEN n.NameCount=1 THEN 'unique name and phone'
ELSE 'common name'
END
ELSE
CASE
WHEN n.NameCount=1 THEN 'common phone'
ELSE 'common phone and name'
END
END
FROM Contacts p
INNER JOIN
(SELECT HomePhone, count(LastName) as PhoneCount
FROM Contacts
GROUP BY HomePhone) ph ON ph.HomePhone = p.HomePhone
INNER JOIN
(SELECT FirstName, count(LastName) as NameCount
FROM Contacts
GROUP BY FirstName) n ON n.FirstName = p.FirstName
LastN FirstN Phone Comment
Hoover Brenda 8138282334 unique name and phone
Washington Brian 9044563211 common name
Roosevelt Brian 7737653279 common name
Reagan Charles 7734567869 unique name and phone