我在两个不同的日期表中有两列,如下所示
A: response table:
key response_date
1 2013/01/01
1 2015/12/01
2 2016/02/01
3 2016/08/01
3 2016/09/01
B: Call table
key attempt call_date
1 1 2014/11/20
1 2 2015/09/01
2 3 2016/01/01
2 4 2016/03/01
2 5 2016/10/15
3 6 2016/03/01
3 7 2016/07/01
当呼叫表中的密钥与响应表中的密钥匹配时,每个呼叫只有一个响应。我想找时间回复。响应在呼叫之后发生,它应该是该呼叫之后的最新响应。例如,对于密钥1,在2014/11/20和2015/09/01有两个呼叫,在2013/01/01和2015/12/01还有两个不同的响应。 2015/12/01是2015/09/01的电话回复日期,因为在2015/09/01更接近电话所以不在2014/11/20调用。然后在2013/01/01和time_diff = 0上没有回复。
对于密钥2,呼叫尝试4和5没有响应。
对于密钥3尝试6,我们可以看到两个响应,其中密钥= 3,但它们是更接近的呼叫尝试7.因此没有任何重复尝试6和time_diff = 0和尝试7的time_diff是(2016/07)之间的天数/ 01,2016 / 08/01)这是尝试7后的最新回应。
key attempt time_diff
1 1 0
1 2 days between(2015/09/01,2015/12/01)
2 3 days between(2016/01/01,2016/02/01)
2 4 0
2 5 0
3 6 0
3 7 days between(2016/07/01,2016/08/01)
sql或R中的任何响应或提示都将受到赞赏。
答案 0 :(得分:1)
你没有指定SQL的方言,所以我为SQL Server编写了这个。它可能需要一些语法调整才能使它在另一个DBMS中工作,但是这里有一个通用的想法可以帮助你:
SELECT
b.[key] AS [key],
b.activity AS activity,
CASE WHEN DATEDIFF(DAY, a.date_A, b.date_B) = c.max_time
THEN C.max_time
ELSE 0 END
AS time_diff
FROM
b
JOIN
(
SELECT
b.[key] AS [key],
MAX(DATEDIFF(DAY, a.date_A, b.date_B)) AS max_time
FROM
a
JOIN
b
ON
a.[key] = b.[key]
GROUP BY
b.[key]
) AS c
ON
b.[key] = c.[key]
JOIN
a
ON
b.[key] = a.[key]
答案 1 :(得分:1)
我不确定我理解(也不能重现)预期结果背后的逻辑。
根据您的预期结果符号,这是我期望的结果。
(sonikachu) masterblaster@thunderdrome:~/pokemon/sonikachu$ pip install -e ~/Downloads/urllib3-1.22
也许您可以解释为什么条目key activity time_diff
1 1 days between(2014/11/20,2015/12/01)
1 2 days between(2015/09/01,2015/12/01)
2 3 days between(2016/01/01,2016/02/01)
2 4 0
2 5 0
3 6 days between(2016/03/01,2016/08/01)
3 7 days between(2016/07/01,2016/08/01)
和key=1,activity=1
在您的示例中有key=3,activity=6
。
答案 2 :(得分:1)
希望低于R
解决方案有帮助!
library(dplyr)
response_table$response_date <- as.Date(response_table$response_date)
call_table$call_date <- as.Date(call_table$call_date)
call_table %>%
left_join(response_table, by = "key") %>%
mutate(date_diff = as.numeric(response_date - call_date)) %>%
filter(date_diff > 0) %>%
group_by(key) %>%
filter(which.min(date_diff) == row_number()) %>%
ungroup %>%
mutate(time_diff = paste0('days between(',call_date,',',response_date,')')) %>%
right_join(call_table, by = c("key", "attempt")) %>%
select(key, attempt, time_diff) %>%
replace_na(list(time_diff='0'))
输出是:
key attempt time_diff
1 1 1 0
2 1 2 days between(2015-09-01,2015-12-01)
3 2 3 days between(2016-01-01,2016-02-01)
4 2 4 0
5 2 5 0
6 3 6 0
7 3 7 days between(2016-07-01,2016-08-01)
示例数据:
response_table <- structure(list(key = c(1L, 1L, 2L, 3L, 3L), response_date = c("2013/01/01",
"2015/12/01", "2016/02/01", "2016/08/01", "2016/09/01")), .Names = c("key",
"response_date"), class = "data.frame", row.names = c(NA, -5L
))
call_table <- structure(list(key = c(1L, 1L, 2L, 2L, 2L, 3L, 3L), attempt = 1:7,
call_date = c("2014/11/20", "2015/09/01", "2016/01/01", "2016/03/01",
"2016/10/15", "2016/03/01", "2016/07/01")), .Names = c("key",
"attempt", "call_date"), class = "data.frame", row.names = c(NA,
-7L))