在previous question中,真棒amdixon能够提出查询以计算IP的重复级别。
我已使用WHERE earning_account_id = ?
SELECT MAX(repeat_factor)
FROM
(
SELECT earning_ip, count(*) / rc.row_count AS repeat_factor
FROM earnings
CROSS JOIN (SELECT count(*) AS row_count FROM earnings WHERE earning_account_id = ?) rc
WHERE earning_account_id = ?
GROUP BY earning_ip
) q
但是,现在我想添加一个额外的安全级别。
我想应用相同类型的查询。但是,我想将其限制为任何具有特定IP地址的帐户分组,而不是将其限制为earning_account_id。
如果他们使用多个alt帐户,我可以更好地全面检测代理垃圾邮件。
请注意,我将不再使用WHERE earning_account_id = ?
换句话说,如果ip_address是" 45.55.80.86"
+--------------------+-------------+---------------------------+
| earning_account_id | earning_ip | select row for repeat_factor query?|
+--------------------+-------------+---------------------------+
| 1 | 45.55.80.86 | YES |
| 1 | 45.55.80.86 | YES |
| 2 | 1.22.83.65 | NO |
| 2 | 91.15.76.37 | NO |
| 3 | 45.55.80.86 | YES |
| 4 | 61.25.76.37 | YES |
| 4 | 1.22.83.65 | YES |
| 4 | 45.55.80.86 | YES |
| 5 | 61.25.76.37 | NO |
+--------------------+-------------+---------------------------+
要返回的值将是此ip的所有收入的repeat_factor,但忽略所有从未包含此IP地址的帐户。
换句话说,我试图找出的是:
"所有帐户中的IP地址重复多少,但是看起来 仅在已查看此IP地址的帐户处?"
答案 0 :(得分:1)
<强>更新强>
根据How to get multiple counts with one SQL query?的想法和@SteveChambers的答案,我们可以进一步简化这一点。
SELECT sum(CASE WHEN earning_ip = ? THEN 1 ELSE 0 END) / count(*)
FROM earnings WHERE earning_account_id IN (
SELECT DISTINCT earning_account_id FROM earnings WHERE earning_ip = ?
)
这也使用示例IP 0.6667
给45.55.80.86
。
我在这里留下原始答案,因为其中一部分可能对另一个查询有用。
原始答案
通过修改子查询并逐步完成,以下内容将返回给定IP的ID。
SELECT earning_account_id
FROM earnings WHERE earning_ip = ?
GROUP BY earning_account_id
如果IP为45.55.80.86
的示例,则查询将返回1, 3, 4
。
然后按ID计算给定IP的出现次数。
SELECT earning_account_id, count(earning_ip) AS occurrence
FROM earnings
WHERE earning_account_id IN (
SELECT earning_account_id
FROM earnings WHERE earning_ip = ?
GROUP BY earning_account_id
) AND earning_ip = ?
GROUP BY earning_account_id
如果是示例,则返回1 => 2, 3 => 1, 4 => 1
然后还计算这些ID的所有IP的数量,并将其与之前的结果连接起来。
SELECT e.earning_account_id, count(e.earning_account_id) AS ip_count, o.occurrence
FROM earnings e
CROSS JOIN (
SELECT earning_account_id, count(earning_ip) AS occurrence FROM earnings
WHERE earning_account_id IN (
SELECT earning_account_id FROM earnings WHERE earning_ip = ?
GROUP BY earning_account_id
) AND earning_ip = ?
GROUP BY earning_account_id
) o
WHERE e.earning_account_id = o.earning_account_id
GROUP BY e.earning_account_id
如果是示例,则帐户的所有IP均为1 => 2, 3 => 1, 4 => 3
。
最后,将所有出现次数的总和除以此行子集中所有IP的总和。
SELECT sum(q.occurrence) / sum(q.ip_count) FROM (
SELECT e.earning_account_id, count(e.earning_account_id) AS ip_count, o.occurrence
FROM earnings e
CROSS JOIN (
SELECT earning_account_id, count(earning_ip) AS occurrence FROM earnings
WHERE earning_account_id IN (
SELECT earning_account_id FROM earnings WHERE earning_ip = ?
GROUP BY earning_account_id
) AND earning_ip = ?
GROUP BY earning_account_id
) o
WHERE e.earning_account_id = o.earning_account_id
GROUP BY e.earning_account_id
) q
如果是示例,则会返回0.6667
,这与4
在6
行中标记为YES
的{{1}}次出现相对应。
答案 1 :(得分:1)
可以简单地获得要选择的行:
select e.*
from example e
join
(select distinct earning_account_id
from example
where ip = '45.55.80.86') subq
on e.earning_account_id = subq.earning_account_id;
此时,如果它是SQL Server数据库,您只需将其捆绑到公用表表达式(CTE)中,并使用其别名而不是amdixon's query中对表名的两个引用。不幸的是MySQL doesn't provide such a luxury因此我们被限制在子查询中,每个子查询都必须有一个唯一的别名 - 所以有点丑陋但是这样做了:
select max(repeat_factor)
from
(
select ip, count(*) / rc.row_count as repeat_factor
from
(select e.*
from example e
join
(select distinct earning_account_id
from example
where ip = '45.55.80.86') subq
on e.earning_account_id = subq.earning_account_id) cte1
cross join ( select count(*) as row_count from
(select e.*
from example e
join
(select distinct earning_account_id
from example
where ip = '45.55.80.86') subq
on e.earning_account_id = subq.earning_account_id) cte2
) rc
group by ip
) q;