清除R

时间:2018-02-14 19:13:48

标签: r dplyr

我的行程数据看起来像这样

ClientID <- c("45675")
Date <- c("10/10/2016")
PickUpAddress <- c("123 Street", "45 Way", "66 Blvd")
DropOffAddress <- c("45 Way", "66 Blvd", "123 Street")
PickUpTime <- c("08:00", "17:00", "18:00")
DropOffTime <- c("8:30", "17:30", "19:00")

df <- data.frame(ClientID, Date, PickUpAddress, DropOffAddress, PickUpTime, DropOffTime)

df
  ClientID       Date PickUpAddress DropOffAddress PickUpTime DropOffTime
1    45675 10/10/2016    123 Street         45 Way      08:00        8:30
2    45675 10/10/2016        45 Way        66 Blvd      17:00       17:30
3    45675 10/10/2016       66 Blvd     123 Street      18:00       19:00

但是,尽管今年有数千条记录和每个客户的不同旅行次数。

此示例中的第三行是返回行程(原始行程的行程)。我想从数据库中删除所有回程。

有什么建议吗?

1 个答案:

答案 0 :(得分:0)

您可以尝试以下基于客户端家庭地址定义的解决方案。

library(dplyr)
library(lubridate)

# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")

df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)

str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format

# define the client home address
df %>%
  group_by(ClientID) %>%                 # group by client
  arrange(Date_PickUpTime) %>%           # order the data by Date_PickUpTime
  mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress

# ... then add filter to the above code

df %>%
  group_by(ClientID) %>% # group by client
  arrange(Date_PickUpTime) %>%      # order the data
  mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
  filter(DropOffAddress != HomeAddress) # condition for filter:
                                        # DropOffAddress is different to HomeAddress
                                        # return trip (3rd) is not selected