我正在处理访客日志数据,需要通过IP地址进行汇总。数据如下所示:
id | ip_address | type | message | ... ----------+----------------+----------+---------------- 1 | 1.2.3.4 | purchase | ... 2 | 1.2.3.4 | visit | ... 3 | 3.3.3.3 | visit | ... 4 | 3.3.3.3 | purchase | ... 5 | 4.4.4.4 | visit | ... 6 | 4.4.4.4 | visit | ...
应该总结一下:
type="purchase" DESC, type="visit" DESC, id DESC
收益率:
chosenid | ip_address | type | message | ... ----------+----------------+----------+---------------- 1 | 1.2.3.4 | purchase | ... 4 | 3.3.3.3 | purchase | ... 6 | 4.4.4.4 | visit | ...
有一种优雅的方式来获取这些数据吗?
一种丑陋的方法如下:
set @row_num = 0; CREATE TEMPORARY TABLE IF NOT EXISTS tt AS SELECT *,@row_num:=@row_num+1 as row_index FROM log ORDER BY type="purchase" DESC, type="visit" DESC, id DESC ORDER BY rating desc;
然后获取每个ip_address(https://stackoverflow.com/questions/121387/fetch-the-row-which-has-the-max-value-for-a-column)的最小row_index和id
然后将这些id加回到原始表
答案 0 :(得分:1)
我认为这应该是你所需要的:
SELECT yourtable.*
FROM
yourtable INNER JOIN (
SELECT ip_address,
MAX(CASE WHEN type='purchase' THEN id END) max_purchase,
MAX(CASE WHEN type='visit' THEN id END) max_visit
FROM yourtable
GROUP BY ip_address) m
ON yourtable.id = COALESCE(max_purchase, max_visit)
请参阅小提琴here。
我的子查询将返回最大购买ID(如果没有购买则返回null)和最大访问ID。然后我用COALESCE加入表,如果max_purchase不为null,则连接将在max_purchase上,否则它将在max_visit上。
答案 1 :(得分:0)
您可以在此处使用Bill Karwin's approach:
SELECT t1.*
FROM (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t1
LEFT JOIN (SELECT *, CASE WHEN type = 'purchase' THEN 1 ELSE 0 END is_purchase FROM myTable) t2
ON t1.ip_address = t2.ip_address
AND (t2.is_purchase > t1.is_purchase
OR (t2.is_purchase = t1.is_purchase AND t2.id > t1.id))
WHERE t2.id IS NULL
SQL小提琴here
答案 2 :(得分:0)
以下查询通过使用相关子查询根据您的规则获取最新id
:
select t.ip_adddress,
(select t2.id
from table t2
where t2.ip_address = t1.ip_address
order by type = 'purchase' desc, id desc
limit 1
) as mostrecent
from (select distinct t.ip_address
from table t
) t;
我们的想法是先通过购买(ID也下降)对数据进行排序,然后按访问排序并选择列表中的第一个数据。如果你有一张ipaddresses表,那么你就不需要distinct
子查询。只需使用该表。
要获得最终结果,我们可join
对此进行操作或使用in
或exists
。这使用in
。
select t.*
from table t join
(select id, (select t2.id
from table t2
where t2.ip_address = t1.ip_address
order by type = 'purchase' desc, id desc
limit 1
) as mostrecent
from (select distinct t.ip_address
from table t
) t
) ids
on t.id = ids.mostrecent;
如果table(ip_address, type, id)
上有索引,此查询最有效。