我有下表SQL表(SparkSQL)。
user_id, city, timestamp, item_id
我需要在每个给定日期找到给定城市的前10个项目(就item_id在该城市中出现的时间而言)。
然后我做了以下事情:
SELECT *
FROM (
SELECT *,
row_number() OVER partition BY city AS rn
FROM mytable) AS foo
ORDER BY rn DESC
然而,虽然它按照rn排序,但它并没有给我一个给定日期的前10个元素。什么是解决这个问题的正确方法?谢谢!
答案 0 :(得分:2)
不知道从火花时间戳开始的TRUNC时间函数是什么。
但首先你需要计算计数,然后是row_number
SELECT *
FROM (
SELECT city, item_id, theDATE, cnt,
ROW_NUMBER() OVER (PARTITION BY city, theDATE
ORDER BY cnt) rn
FROM (SELECT city,
timestamp,
item_id,
to_date(timestamp) as theDATE, -- remove time and leave just date.
COUNT(item_id) OVER (PARTITION BY city, to_date(timestamp)) cnt
FROM mytable
) AS foo
) AS boo
WHERE rn <= 10
ORDER BY city, theDATE, rn