我是sql的新手,我正试图找出以下内容.. 想象一下下表:
user_id, category_id
1, 12344
1, 12344
1, 12345
2, 12345
2, 12345
3, 12344
3, 12344
等等.. 我想找到每个类别的重复用户数量..
所以,在上面的例子中..
12344, 2 (because user_id 1 and 3 are repeated users)
12345, 1 (user_id 2 is repeated user.. 1 is not as that user visited just once)
我如何在sql / hive中解决这个问题?
答案 0 :(得分:1)
要跟踪你想要的东西有点困难,但要测试一下:
select category_id, count(user_id) from (Select category_id, user_id, count(table_primary_id) as 'total' from tablename group by category_id, user_id) a where total > 1 group by category_id
子查询计算用户访问某个类别的次数,而外部查询应计算多次访问该类别的用户数。
答案 1 :(得分:1)
E.g:
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL AUTO_INCREMENT PRIMARY KEY
,user_id INT NOT NULL
,category_id INT NOT NULL
);
INSERT INTO my_table (user_id,category_id) VALUES
(1, 12344),
(1, 12344),
(1, 12345),
(2, 12345),
(2, 12345),
(3, 12344),
(3, 12344);
SELECT category_id
, COUNT(*) total
FROM
( SELECT x.*
FROM my_table x
JOIN my_table y
ON y.user_id = x.user_id
AND y.category_id = x.category_id
AND y.id < x.id
) a
GROUP
BY category_id;
+-------------+-------+
| category_id | total |
+-------------+-------+
| 12344 | 2 |
| 12345 | 1 |
+-------------+-------+