我需要一些帮助来获取具有与唯一值相关联的备用条目集的记录(例如:user_id)
我希望输出仅为(1111,2222,3333)
以下是该方案: user_id 1111参加.net课程2005-01-01至2006-12-31
他后来从2007-01-01到2009-12-31参加了java 他后来又回到了.net所以我想检索这些user_id的
user_id 4444不应该在输出中,因为没有替代课程。
更新:4444他再次从2007年到2009年开始他的Java课程 从2010年到2012年参加了Java。后来他参加了.net但从未来过 回到Java所以他必须从输出中排除
如果使用分组依据,它将考虑记录,而不考虑备用课程名称。 我们可以通过循环和比较备用课程名称来创建一个完成此过程的过程,但我想知道查询是否可以这样做?
答案 0 :(得分:2)
您可以使用两个INNER JOIN
操作:
SELECT DISTINCT user_id
FROM mytable AS t1
INNER JOIN mytable AS t2
ON t1.user_id = t2.user_id AND t1.id < t2.id AND t1.course_name <> t2.course_name
INNER JOIN mytable AS t3
ON t2.user_id = t3.user_id AND t2.id < t3.id AND t1.course_name = t3.course_name
我假设id
是一个自动增量字段,它反映了在DB中插入行的顺序。否则,您应该在其位置使用日期字段。
答案 1 :(得分:1)
与Girogos Betsos的回答相同,只有选择不同以防止重复。
SELECT DISTINCT user_id
FROM mytable AS t1
INNER JOIN mytable AS t2
ON t1.user_id = t2.user_id AND t1.Start_Date < t2.Start_Date AND
t1.course_name <> t2.course_name
INNER JOIN mytable AS t3
ON t2.user_id = t3.user_id AND t2.Start_Date < t3.Start_Date AND
t1.course_name = t3.course_name
编辑:使用Start_Date,因为答案已更新,ID不一定是顺序的。
答案 2 :(得分:1)
这是一个使用窗口聚合功能而不是多个自连接的版本:
df %>% group_by(time) %>%
mutate(glucose_sq = glucose^2,
glucose_sq_plus2 = glucose_sq+2)
如果用户在一系列中多次使用课程SELECT DISTINCT user_id
FROM
(
SELECT user_id
,course_name
,start_date
,RANK() -- number all courses
OVER (PARTITION BY user_id
ORDER BY start_date)
-
RANK() -- number each course
OVER (PARTITION BY user_id, course_name
ORDER BY start_date) AS x
FROM tab
) dt
GROUP BY user_id, course_name
HAVING MIN(x) <> MAX(x) -- same course but another inbetween
将保持不变,如果其中有另一门课程,则会更改:
x
答案 3 :(得分:0)
使用单个表格扫描,不依赖于GROUP BY
:
WITH table_name ( user_id, start_date, end_date, course_name, id ) AS (
SELECT 1111, DATE '2005-01-01', DATE '2006-12-31', '.net', 1 FROM DUAL UNION ALL
SELECT 1111, DATE '2007-01-01', DATE '2009-12-31', 'java', 2 FROM DUAL UNION ALL
SELECT 1111, DATE '2010-01-01', DATE '2020-12-31', '.net', 3 FROM DUAL UNION ALL
SELECT 2222, DATE '2005-01-01', DATE '2006-12-31', 'java', 4 FROM DUAL UNION ALL
SELECT 2222, DATE '2007-01-01', DATE '2008-12-31', '.net', 5 FROM DUAL UNION ALL
SELECT 2222, DATE '2009-01-01', DATE '2012-12-31', '.net', 6 FROM DUAL UNION ALL
SELECT 2222, DATE '2013-01-01', DATE '2016-12-31', 'java', 7 FROM DUAL UNION ALL
SELECT 3333, DATE '2005-01-01', DATE '2007-12-31', 'java', 8 FROM DUAL UNION ALL
SELECT 3333, DATE '2007-01-01', DATE '2008-12-31', '.net', 9 FROM DUAL UNION ALL
SELECT 3333, DATE '2009-01-01', DATE '2013-12-31', 'java', 10 FROM DUAL UNION ALL
SELECT 3333, DATE '2014-01-01', DATE '2016-12-31', '.net', 11 FROM DUAL UNION ALL
SELECT 4444, DATE '2007-01-01', DATE '2009-12-31', 'java', 12 FROM DUAL UNION ALL
SELECT 4444, DATE '2010-01-01', DATE '2012-12-31', 'java', 13 FROM DUAL UNION ALL
SELECT 4444, DATE '2013-01-01', DATE '2015-12-31', '.net', 14 FROM DUAL UNION ALL
SELECT 4444, DATE '2016-01-01', DATE '2016-12-31', '.net', 15 FROM DUAL
)
SELECT DISTINCT user_id
FROM (
SELECT user_id,
LEAD( course_name )
OVER ( PARTITION BY user_id, course_name ORDER BY start_date )
AS next_same_course,
LEAD( course_name )
OVER ( PARTITION BY user_id ORDER BY start_date )
AS next_course
FROM table_name
)
WHERE next_same_course IS NOT NULL
AND next_course <> next_same_course;