Here is the dummy data,这是一个电话记录数据表。
这是它的一瞥:
| call_id | customer | company | call_start |
|-----------|--------------|-------------|---------------------|
|1411482360 | 001143792042 | 08444599175 | 2014-07-31 13:55:03 |
|1476992122 | 001143792042 | 08441713191 | 2014-07-31 14:05:10 |
customer
和company
字段代表他们的电话号码。
修改
- 客户A致电公司A.
- 如果客户A致电B公司,那么B公司将获得+1收益,而A公司将失去+1。
- 如果客户A致电公司C,则公司C将获得+1收益,而公司B将失去+1
- 如果客户A再次致电公司C,则溢出/收益不会受到影响
- 只有在客户A发出第二个电话后,才会发挥收益/损失。
- 如果客户按此顺序呼叫公司:A,B,B,C,A,A,C,B,D,则流程应如下所示:
A ->
B -> B +1 gain, A +1 lost
B ->
C -> C +1 gain, B +1 lost
A -> A +1 gain, C +1 lost
A ->
C -> C +1 gain, A +1 lost
B -> B +1 gain, C +1 lost
D -> D +1 gain, B +1 lost
在上述过程之后,我们应该将总值设为:
Company Total gain Total lost
A 1 2
B 2 2
C 2 2
D 1 0
我开始研究这个但是它错了,它只是一个想法,它不会根据上述条件给我单独增加的增益和丢失值:
DROP TABLE IF EXISTS GetTotalGainAndLost;
CREATE TEMPORARY TABLE IF NOT EXISTS GetTotalGainAndLost
AS
(
SELECT SUM(count) as 'TotalGainAndLost', `date`, DAY(`date`) as 'DAY'
FROM (SELECT count(*) as 'count', customer, `date`
FROM (SELECT customer, company, count(*) AS 'count', DATE_FORMAT(`call_end`,'%Y-%m-%d') as 'date'
FROM calls
WHERE `call_end` LIKE CONCAT(2014, '-', RIGHT(CAST(concat('0', 01) AS CHAR),2),'-%')
GROUP BY customer, company, DAY(`call_end`) ORDER BY `call_end` ASC)
as tbl1 group by customer, `date` having count(*) > 1)
as tbl2 GROUP by `date`
);
Select * from GetTotalGainAndLost;
DROP TABLE GetTotalGainAndLost;
此查询未显示任何结果。
每个公司和日期应该是一行(总收益和每天丢失的电话,例如1月)
| company | totalGain | totalLost | date | DAY |
|-------------|------------|-------------|--------------|-------|
| 08444599175 | 17 | 6 | 2014-07-01 | 1 |
| 08444599175 | 12 | 10 | 2014-07-02 | 2 |
| 08444599175 | 3 | 6 | 2014-07-02 | 3 |
| 08444599175 | .... | ... | ... | ... |
| 08444599175 | 7 | 6 | 2014-07-31 | 31 |
答案 0 :(得分:5)
将 N 表示为公司出现的次数。让我们尝试用三个简单的规则简化公式。
在你的例子中:
结果
Company Gain Lost
A 2 3
B 3 3
C 2 2
D 1 0
首先,我们首先计算每家公司的数量。
SELECT
company, COUNT(*) AS gain, COUNT(*) AS lost, DATE(call_start) AS date
FROM calls
GROUP BY DATE(call_start), company
然后,我们开始选择每个公司第一次出现的每个客户的号码。
SELECT company, -COUNT(*) AS gain, 0 AS lost, DATE(call_start) AS `date`
FROM calls INNER JOIN (
SELECT MIN(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
) AS t ON (calls.call_id = t.call_id)
GROUP BY DATE(call_start), calls.company
最后出现的公司数量。
SELECT company, 0 AS gain, -COUNT(*) AS lost, DATE(call_start) AS `date`
FROM calls INNER JOIN (
SELECT MAX (call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
) AS t ON (calls.call_id = t.call_id)
GROUP BY DATE(call_start), calls.company
最后,我们可以使用UNION ALL将整个SQL组合在一起,然后按照。
执行另一个组SELECT company, SUM(gain) AS gain, SUM(lost) AS lost, `date` FROM (
(
SELECT
company, COUNT(*) AS gain, COUNT(*) AS lost, DATE(call_start) AS `date`
FROM calls
GROUP BY DATE(call_start), company
) UNION ALL (
SELECT company, -COUNT(*) AS gain, 0 AS lost, DATE(call_start) AS `date`
FROM calls INNER JOIN (
SELECT MIN(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
) AS t ON (calls.call_id = t.call_id)
GROUP BY DATE(call_start), calls.company
) UNION ALL (
SELECT company, 0 AS gain, -COUNT(*) AS lost, DATE(call_start) AS `date`
FROM calls INNER JOIN (
SELECT MAX(call_id) AS call_id FROM calls GROUP BY DATE(call_start), customer
) AS t ON (calls.call_id = t.call_id)
GROUP BY DATE(call_start), calls.company
)
) AS t
GROUP BY `date`, company
上述查询假设每个新的一天都是独立的。例如,
结果将是
COM G L DAY
----------------
A 0 1 1
B 1 1 1
C 1 0 1
D 0 1 2
E 1 0 2
答案 1 :(得分:3)
这应该有效 -
CTEGains 了解公司每个客户每个日期出现的次数。
CTEFirst 查明该公司当天是否是该客户的第一次联系。
CTELast 查明该公司是否是当天该客户的最后一次联系。
然后代码应遵循您指出的逻辑。
CREATE TEMPORARY TABLE CTEGains (RNo int, customer varchar(14), company varchar(16), startdate date, gains int)
CREATE TEMPORARY TABLE CTEFirst (customer varchar(14), call_start date, company varchar(16))
CREATE TEMPORARY TABLE CTELast (customer varchar(14), call_start date, company varchar(16))
Insert into CTEGains
Select ROW_NUMBER() over (partition by customer order by Customer) Rno, customer, company, Convert(date,call_start) startdate, count(company) gains
from calls
group by customer, company, Convert(date,call_start), call_start
Insert into CTEFirst
Select customer, min(Convert(date,call_start)) call_start, min(company) company
from calls
group by customer, Convert(date,call_start)
Insert into CTELast
Select customer, max(Convert(date,call_start)) call_start, max(company) company
from #calls
group by customer, Convert(date,call_start)
Select c1.company,
SUM(gains) - case when exists (Select * from CTEGains c2 where c2.customer = max(c1.customer) and max(c1.Rno) = c2.Rno - 1 and c1.company = c2.company and c1.startdate = c2.startdate) then 1 else 0 end --Didn't gain as same company called
- case when exists (select * from CTEFirst c2 where c2.company = c1.company and c2.call_start = c1.startdate) then 1 else 0 end TotalGain -- Didn't gain as first company
, SUM(gains) - case when exists (Select * from CTEGains c2 where c2.customer = max(c1.customer) and max(c1.Rno) = c2.Rno - 1 and c1.company = c2.company and c1.startdate = c2.startdate) then 1 else 0 end --Didn't lose as same company as last called
- case when exists (select * from CTELast c2 where c2.company = c1.company and c2.call_start = c1.startdate) then 1 else 0 end TotalLost -- didn't lose as last company
, startdate [date], DatePart(DAY, startdate) [Day]
from CTEGains c1
group by c1.company, c1.startdate
Drop Table CTEFirst
Drop Table CTEGains
Drop Table CTELast
答案 2 :(得分:3)
我认为最简单的方法是使用两个查询。首先,我们可以获得总收益,计算每个客户对不同公司的每次通话:
select g.company company, count(g.call_id) gain
from calls c
join calls g on c.customer = g.customer and c.company <> g.company and c.call_start < g.call_start
left join calls m on g.customer = m.customer and g.company <> m.company and g.call_start > m.call_start and m.call_start > c.call_start
where m.call_id is null
group by g.company;
如果客户向各个公司拨打各种电话,则需要左连接才能计算额外的收益(即,如果客户电话公司a,b和c公司c只有一个获得,而不是两个)。
采用相同方法的总损失:
select l.company company, count(l.call_id) lost
from calls c
join calls l on c.customer = l.customer and c.company <> l.company and c.call_start > l.call_start
left join calls m on l.customer = m.customer and l.company <> m.company and c.call_start > m.call_start and l.call_start < m.call_start
where m.call_id is null
group by l.company;
这里有一个小小的演示解决方案:http://sqlfiddle.com/#!2/3236ab/7
答案 3 :(得分:2)
让我们先做一些定义:
我们已经介绍了 first 和 last 的概念,这意味着我们需要在我们的调用集上定义一个总订单。我们可以遵循我们想要的任何规则,但出于解释的目的,我假设呼叫按开始时间排序,并且在相同的开始时间由id排序。换句话说:
callA.sartTime < callB.startTime
,则callA < callB
callA.startTime = callB.startTime
和callA.id = callB.id
,则callA < callB
请注意我们如何使用以下查询获取集合的所有非首次调用:
SELECT *
FROM calls AS non_first_calls
RIGHT JOIN calls
ON non_first_calls.customer = calls.customer
AND non_first_calls.call_start >= calls.call_start
AND non_first_calls.call_id > calls.call_id
WHERE non_first_calls.call_id IS NOT NULL
(查询输出有重复,即调用可以出现多次)
同样,我们可以按如下方式获取所有非最后一次调用:
SELECT *
FROM calls AS non_last_calls
RIGHT JOIN calls
ON non_last_calls.customer = calls.customer
AND non_last_calls.call_start <= calls.call_start
AND non_last_calls.call_id < calls.call_id
WHERE non_last_calls.call_id IS NOT NULL
业务逻辑
每次客户拨打任何其他电话后,公司都会获得+1。这意味着,对于任何给定的公司,其收益等于它收到的非首次呼叫的数量。同样,公司的损失等于它收到的非最后一次通话的数量。
强大的查询
因此,对于每家公司,我们只需要计算它收到的非首次呼叫和非最后呼叫的数量。
每个公司的 部分意味着我们需要获得完整的公司列表。我们可以使用此查询执行此操作:
SELECT DISTINCT company FROM calls
全部放在一起:
SELECT
-- The company
companies.company
-- How many non-first calls (gains) it has received
,(SELECT COUNT(DISTINCT non_first_calls.call_id) gains
FROM calls AS non_first_calls
RIGHT JOIN calls
ON non_first_calls.customer = calls.customer
AND non_first_calls.call_start >= calls.call_start
AND non_first_calls.call_id > calls.call_id
WHERE non_first_calls.company = companies.company
) gains
-- How many non-last calls (losses) it has received
,(SELECT COUNT(DISTINCT non_last_calls.call_id) gains
FROM calls AS non_last_calls
RIGHT JOIN calls
ON non_last_calls.customer = calls.customer
AND non_last_calls.call_start <= calls.call_start
AND non_last_calls.call_id < calls.call_id
WHERE non_last_calls.company = companies.company
) losses
-- From the set of all companies
FROM (SELECT DISTINCT company FROM calls) companies
关于效果
我不确定在处理大量数据时此查询的效率是否可以接受。
至少你需要(customer
,call_start
)(按此顺序)和(company
)上的另一个索引的组合索引。这是我在此查询上运行EXPLAIN后获得的输出,他提到了索引和您提供的示例数据。