我有一个名为SQL
的{{1}}表,如下所示:
table_name
我想要一个+------------+------------+-----+---------------+
| login_name | session_id | ip | creation_date |
+------------+------------+-----+---------------+
| name1 | sid1 | ip1 | date1 |
| name1 | sid1 | ip2 | date2 |
| name1 | sid1 | ip2 | date5 |
| name2 | sid2 | ip1 | date3 |
| name2 | sid2 | ip1 | date4 |
+------------+------------+-----+---------------+
代码(请使用Postgres)来选择多个sql
使用session_id
的行。对于上面的示例,结果应为
ip
我有这个代码可行,但我相当新,我相信它可以做得更好(更清晰,更好的性能)
+------------+------------+-----+---------------+
| login_name | session_id | ip | creation_date |
+------------+------------+-----+---------------+
| name1 | sid1 | ip1 | date1 |
| name1 | sid1 | ip2 | date2 |
+------------+------------+-----+---------------+
请注意上面代码中的以下代码重复:
SELECT table_name.login_name, table_name.session_id, table_name.ip, table_name_grouped.event_date AS creation_date
FROM table_name
INNER JOIN
(
-- session_id - ip pairs
SELECT table_name.session_id, table_name.ip, min(table_name.creation_date) AS event_date
FROM table_name
GROUP BY table_name.session_id, table_name.ip
) table_name_grouped
ON table_name.creation_date = table_name_grouped.event_date AND
table_name.session_id = table_name_grouped.session_id AND
table_name.ip = table_name_grouped.ip
WHERE table_name.session_id IN (
-- get session_ids that used in multiple ips
SELECT table_name_grouped.session_id
FROM
(
-- session_id - ip pairs
SELECT table_name.session_id, table_name.ip, min(table_name.creation_date) AS event_date
FROM table_name
GROUP BY table_name.session_id, table_name.ip
) table_name_grouped
GROUP BY table_name_grouped.session_id
HAVING count(table_name_grouped.session_id) > 1
);
所以问题是:
更新
我更新了示例以显示每个column1-column2值对只需要一行。 (感谢惊人的照明快速答案)。
答案 0 :(得分:3)
select * from table_name
where session_id in
(
select session_id
from table_name
group by session_id
having count(distinct ip) > 1
)
session_id
的内部选择组,仅包含具有多个唯一ip
的内容。外部选择获取那些session_id
的完整记录。
另一种可能性是将内部选择加入外部而不是使用IN()
。
要仅获取具有最小日期的对
select t.*
from table_name t
join
(
select session_id, ip, min(creation_date) dt
from table_name
group by session_id, ip
) t2 on t.session_id = t2.session_id and t.ip = t2.ip and t.creation_date = t2.dt
where t.session_id in
(
select session_id
from table_name
group by session_id
having count(distinct ip) > 1
)
答案 1 :(得分:1)
SELECT *
FROM table_name n
WHERE EXISTS ( -- another one exists
SELECT *
FROM table_name x
WHERE x.session_id = n.session_id -- with the same session_id
AND x.ip <> n.ip -- but a different ip
);
答案 2 :(得分:1)
您可以使用窗口函数:
SELECT login_name, session_id, ip, creation_date
FROM (
SELECT login_name, session_id, ip, creation_date,
MAX(ip) OVER (PARTITION BY session_id) AS maxIP,
MIN(ip) OVER (PARTITION BY session_id) AS minIP,
ROW_NUMBER() OVER (PARTITION BY session_id, ip
ORDER BY creation_date) AS rn
FROM table_name ) t
WHERE maxIP <> minIP AND rn = 1
MAX
和MIN
用于检测具有多个session_id
值的ip
个切片(遗憾的是,COUNT(DISTINCT ip)
的窗口版本在Postgresql中不可用
ROW_NUMBER
用于选择每creation_date
每ip
session_id
行的__author__ = 'joshcrist'
from bs4 import BeautifulSoup
from urllib2 import urlopen
BASE_URL = "http://www.chicagoreader.com"
def get_category_links(section_url):
html = urlopen(section_url).read()
soup = BeautifulSoup(html, "lxml")
boccat = soup.find("dl", "boccat")
category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")]
return category_links
def get_category_winner(category_url):
html = urlopen(category_url).read()
soup = BeautifulSoup(html, "lxml")
category = soup.find("h1", "headline").string
winner = [h2.string for h2 in soup.findAll("h2", "boc1")]
runners_up = [h2.string for h2 in soup.findAll("h2", "boc2")]
return {"category": category,
"category_url": category_url,
"winner": winner,
"runners_up": runners_up}
行。