如何在R中的不同列中将一行的值与其紧邻的另一行匹配

时间:2019-04-23 17:41:15

标签: r

因此,我有客户在线购买机票的这些数据。我想看看其中有多少人预订了回程票。因此,基本上,我想针对同一个人和帐户将起始城市与紧邻行的目标城市进行匹配,反之亦然,这将为我提供他们的双向旅行数据,然后我要计算他们的旅行天数。我正在R中尝试执行此操作,但是我无法将原点与直接行的目的地进行匹配,反之亦然。

我已经对客户的帐号进行了排序,以手动查看是否有回程并且有很多回程。

数据如下:

Account number          origin city Destination city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018

1 个答案:

答案 0 :(得分:2)

另一种选择是在字段相反的情况下加入自身。

编辑::添加了“ trip_num”以更好地处理同一个人的重复旅行。

library(dplyr)
# First, convert date field to Date type
df <- df %>% 
  mutate(Date = lubridate::mdy(Date)) %>%
  # update with M-M's suggestion in comments
  mutate_at(.vars = vars(origin_city, Destination_city), .funs = toupper) %>%
  # EDIT: adding trip_num to protect against extraneous joins for repeat trips
  group_by(Account_number, origin_city, Destination_city) %>%
  mutate(trip_num = row_number()) %>%
  ungroup()

df2 <- df %>%
  left_join(df, by = c("Account_number", "trip_num",
                       "origin_city" = "Destination_city",
                       "Destination_city" = "origin_city")) %>%
  mutate(days = (Date.x - Date.y)/lubridate::ddays(1))


> df2
# A tibble: 6 x 7
  Account_number origin_city Destination_city Date.x     trip_num Date.y      days
           <int> <chr>       <chr>            <date>        <int> <date>     <dbl>
1              1 LONDON      CHICAGO          2018-07-22        1 2018-08-22   -31
2              2 MILAN       LONDON           2018-07-23        1 2018-07-28    -5
3              2 LONDON      MILAN            2018-07-28        1 2018-07-23     5
4              1 CHICAGO     LONDON           2018-08-22        1 2018-07-22    31
5              2 MILAN       LONDON           2018-08-23        2 2018-08-28    -5
6              2 LONDON      MILAN            2018-08-28        2 2018-08-23     5

数据:(增加了第2个帐户的重复行程)

df <- read.table(
  header = T, 
  stringsAsFactors = F,
  text = "Account_number          origin_city Destination_city    Date
1                     London    chicago              7/22/2018
2                      Milan    London               7/23/2018
2                      London    Milan               7/28/2018
1                     chicago    london              8/22/2018
2                      Milan    London               8/23/2018
2                      London    Milan               8/28/2018")