Sql选择具有column1值的行,其中出现多个column2值

时间:2015-07-09 19:22:26

标签: sql postgresql select

我有一个名为SQL的{​​{1}}表,如下所示:

table_name

我想要一个+------------+------------+-----+---------------+ | login_name | session_id | ip | creation_date | +------------+------------+-----+---------------+ | name1 | sid1 | ip1 | date1 | | name1 | sid1 | ip2 | date2 | | name1 | sid1 | ip2 | date5 | | name2 | sid2 | ip1 | date3 | | name2 | sid2 | ip1 | date4 | +------------+------------+-----+---------------+ 代码(请使用Postgres)来选择多个sql使用session_id的行。对于上面的示例,结果应为

ip

我有这个代码可行,但我相当新,我相信它可以做得更好(更清晰,更好的性能)

+------------+------------+-----+---------------+
| login_name | session_id | ip  | creation_date |
+------------+------------+-----+---------------+
| name1      | sid1       | ip1 | date1         |
| name1      | sid1       | ip2 | date2         |
+------------+------------+-----+---------------+

请注意上面代码中的以下代码重复:

    SELECT table_name.login_name, table_name.session_id, table_name.ip, table_name_grouped.event_date AS creation_date 
    FROM table_name
    INNER JOIN
    (
        -- session_id - ip pairs
        SELECT table_name.session_id, table_name.ip, min(table_name.creation_date) AS event_date
        FROM table_name
        GROUP BY table_name.session_id, table_name.ip
    ) table_name_grouped
    ON table_name.creation_date = table_name_grouped.event_date AND 
        table_name.session_id = table_name_grouped.session_id AND
        table_name.ip = table_name_grouped.ip
    WHERE table_name.session_id IN (
        -- get session_ids that used in multiple ips
        SELECT table_name_grouped.session_id
        FROM 
        (
            -- session_id - ip pairs
            SELECT table_name.session_id, table_name.ip, min(table_name.creation_date) AS event_date
            FROM table_name
            GROUP BY table_name.session_id, table_name.ip
        ) table_name_grouped
        GROUP BY table_name_grouped.session_id
        HAVING count(table_name_grouped.session_id) > 1
    );

所以问题是:

  1. 上述解决方案可以改进吗?
  2. 您是否看到任何潜在问题,例如表现?
  3. 更新

    我更新了示例以显示每个column1-column2值对只需要一行。 (感谢惊人的照明快速答案)。

3 个答案:

答案 0 :(得分:3)

select * from table_name
where session_id in
(
    select session_id
    from table_name
    group by session_id
    having count(distinct ip) > 1
)

session_id的内部选择组,仅包含具有多个唯一ip的内容。外部选择获取那些session_id的完整记录。

另一种可能性是将内部选择加入外部而不是使用IN()

要仅获取具有最小日期的对

select t.* 
from table_name t
join
(
    select session_id, ip, min(creation_date) dt
    from table_name
    group by session_id, ip
) t2 on t.session_id = t2.session_id and t.ip = t2.ip and t.creation_date = t2.dt
where t.session_id in
(
    select session_id
    from table_name
    group by session_id
    having count(distinct ip) > 1
)

答案 1 :(得分:1)

SELECT * 
FROM table_name n
WHERE EXISTS ( -- another one exists
   SELECT *
   FROM table_name x
   WHERE x.session_id = n.session_id -- with the same session_id
     AND x.ip <> n.ip                -- but a different ip
   );

答案 2 :(得分:1)

您可以使用窗口函数:

SELECT login_name, session_id, ip, creation_date
FROM (
  SELECT login_name, session_id, ip, creation_date,
         MAX(ip) OVER (PARTITION BY session_id) AS maxIP,
         MIN(ip) OVER (PARTITION BY session_id) AS minIP,
         ROW_NUMBER() OVER (PARTITION BY session_id, ip 
                            ORDER BY creation_date) AS rn
  FROM table_name ) t
WHERE maxIP <> minIP AND rn = 1

MAXMIN用于检测具有多个session_id值的ip个切片(遗憾的是,COUNT(DISTINCT ip)的窗口版本在Postgresql中不可用

ROW_NUMBER用于选择每creation_dateip session_id行的__author__ = 'joshcrist' from bs4 import BeautifulSoup from urllib2 import urlopen BASE_URL = "http://www.chicagoreader.com" def get_category_links(section_url): html = urlopen(section_url).read() soup = BeautifulSoup(html, "lxml") boccat = soup.find("dl", "boccat") category_links = [BASE_URL + dd.a["href"] for dd in boccat.findAll("dd")] return category_links def get_category_winner(category_url): html = urlopen(category_url).read() soup = BeautifulSoup(html, "lxml") category = soup.find("h1", "headline").string winner = [h2.string for h2 in soup.findAll("h2", "boc1")] runners_up = [h2.string for h2 in soup.findAll("h2", "boc2")] return {"category": category, "category_url": category_url, "winner": winner, "runners_up": runners_up} 行。

Demo here