我的行程数据看起来像这样
ClientID <- c("45675")
Date <- c("10/10/2016")
PickUpAddress <- c("123 Street", "45 Way", "66 Blvd")
DropOffAddress <- c("45 Way", "66 Blvd", "123 Street")
PickUpTime <- c("08:00", "17:00", "18:00")
DropOffTime <- c("8:30", "17:30", "19:00")
df <- data.frame(ClientID, Date, PickUpAddress, DropOffAddress, PickUpTime, DropOffTime)
df
ClientID Date PickUpAddress DropOffAddress PickUpTime DropOffTime
1 45675 10/10/2016 123 Street 45 Way 08:00 8:30
2 45675 10/10/2016 45 Way 66 Blvd 17:00 17:30
3 45675 10/10/2016 66 Blvd 123 Street 18:00 19:00
但是,尽管今年有数千条记录和每个客户的不同旅行次数。
此示例中的第三行是返回行程(原始行程的行程)。我想从数据库中删除所有回程。
有什么建议吗?
答案 0 :(得分:0)
您可以尝试以下基于客户端家庭地址定义的解决方案。
library(dplyr)
library(lubridate)
# create date/time format variables
df$Date_PickUpTime <- paste(df$Date, df$PickUpTime, sep = " ")
df$Date_DropOffTime <- paste(df$Date, df$DropOffTime, sep = " ")
df$Date_PickUpTime <- mdy_hm(df$Date_PickUpTime)
df$Date_DropOffTime <- mdy_hm(df$Date_DropOffTime)
str(df) # as you can see Date_PickUpTime and Date_DropOffTime are in POSIXct format
# define the client home address
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data by Date_PickUpTime
mutate(HomeAddress = PickUpAddress[1]) # client home address is the first PickUpAddress
# ... then add filter to the above code
df %>%
group_by(ClientID) %>% # group by client
arrange(Date_PickUpTime) %>% # order the data
mutate(HomeAddress = PickUpAddress[1]) %>% # client home address
filter(DropOffAddress != HomeAddress) # condition for filter:
# DropOffAddress is different to HomeAddress
# return trip (3rd) is not selected