如何在某些条件下找到两个不同列中日期之间的最小差异

时间:2018-02-14 23:05:12

标签: sql r

我在两个不同的日期表中有两列,如下所示

A: response table:
key    response_date 
1      2013/01/01
1      2015/12/01
2      2016/02/01
3      2016/08/01
3      2016/09/01

B: Call table
key  attempt  call_date
1    1        2014/11/20
1    2        2015/09/01
2    3        2016/01/01
2    4        2016/03/01
2    5        2016/10/15
3    6        2016/03/01
3    7        2016/07/01

当呼叫表中的密钥与响应表中的密钥匹配时,每个呼叫只有一个响应。我想找时间回复。响应在呼叫之后发生,它应该是该呼叫之后的最新响应。例如,对于密钥1,在2014/11/20和2015/09/01有两个呼叫,在2013/01/01和2015/12/01还有两个不同的响应。 2015/12/01是2015/09/01的电话回复日期,因为在2015/09/01更接近电话所以不在2014/11/20调用。然后在2013/01/01和time_diff = 0上没有回复。

对于密钥2,呼叫尝试4和5没有响应。

对于密钥3尝试6,我们可以看到两个响应,其中密钥= 3,但它们是更接近的呼叫尝试7.因此没有任何重复尝试6和time_diff = 0和尝试7的time_diff是(2016/07)之间的天数/ 01,2016 / 08/01)这是尝试7后的最新回应。

key  attempt  time_diff
1    1        0
1    2        days between(2015/09/01,2015/12/01)
2    3        days between(2016/01/01,2016/02/01)
2    4        0
2    5        0
3    6        0
3    7        days between(2016/07/01,2016/08/01)

sql或R中的任何响应或提示都将受到赞赏。

3 个答案:

答案 0 :(得分:1)

你没有指定SQL的方言,所以我为SQL Server编写了这个。它可能需要一些语法调整才能使它在另一个DBMS中工作,但是这里有一个通用的想法可以帮助你:

SELECT 
    b.[key]         AS  [key],
    b.activity      AS  activity,
    CASE WHEN DATEDIFF(DAY, a.date_A, b.date_B) = c.max_time 
    THEN C.max_time
    ELSE 0 END      
                    AS time_diff
FROM
    b
JOIN
    (
    SELECT 
        b.[key]                                 AS  [key],
        MAX(DATEDIFF(DAY, a.date_A, b.date_B))  AS  max_time
    FROM
        a
    JOIN
        b 
    ON  
        a.[key] =  b.[key]
    GROUP BY 
        b.[key]
    ) AS c
ON
    b.[key] = c.[key]
JOIN
    a
ON
    b.[key] = a.[key]

答案 1 :(得分:1)

我不确定我理解(也不能重现)预期结果背后的逻辑。

根据您的预期结果符号,这是我期望的结果。

(sonikachu) masterblaster@thunderdrome:~/pokemon/sonikachu$ pip install -e ~/Downloads/urllib3-1.22

也许您可以解释为什么条目key activity time_diff 1 1 days between(2014/11/20,2015/12/01) 1 2 days between(2015/09/01,2015/12/01) 2 3 days between(2016/01/01,2016/02/01) 2 4 0 2 5 0 3 6 days between(2016/03/01,2016/08/01) 3 7 days between(2016/07/01,2016/08/01) key=1,activity=1在您的示例中有key=3,activity=6

答案 2 :(得分:1)

希望低于R解决方案有帮助!

library(dplyr)

response_table$response_date <- as.Date(response_table$response_date)
call_table$call_date <- as.Date(call_table$call_date)

call_table %>%
  left_join(response_table, by = "key") %>%
  mutate(date_diff = as.numeric(response_date - call_date)) %>%
  filter(date_diff > 0) %>%
  group_by(key) %>%
  filter(which.min(date_diff) == row_number()) %>%
  ungroup %>%
  mutate(time_diff = paste0('days between(',call_date,',',response_date,')')) %>%
  right_join(call_table, by = c("key", "attempt")) %>%
  select(key, attempt, time_diff) %>%
  replace_na(list(time_diff='0'))

输出是:

    key attempt time_diff                                                   
1     1       1 0                                  
2     1       2 days between(2015-09-01,2015-12-01)
3     2       3 days between(2016-01-01,2016-02-01)
4     2       4 0                                  
5     2       5 0                                  
6     3       6 0                                  
7     3       7 days between(2016-07-01,2016-08-01)

示例数据:

response_table <- structure(list(key = c(1L, 1L, 2L, 3L, 3L), response_date = c("2013/01/01", 
"2015/12/01", "2016/02/01", "2016/08/01", "2016/09/01")), .Names = c("key", 
"response_date"), class = "data.frame", row.names = c(NA, -5L
))

call_table <- structure(list(key = c(1L, 1L, 2L, 2L, 2L, 3L, 3L), attempt = 1:7, 
    call_date = c("2014/11/20", "2015/09/01", "2016/01/01", "2016/03/01", 
    "2016/10/15", "2016/03/01", "2016/07/01")), .Names = c("key", 
"attempt", "call_date"), class = "data.frame", row.names = c(NA, 
-7L))