我有2种不同的数据(出租和搜索)。我想根据用户和时间将租金与搜索进行匹配。
示例:
Mon Apr 29 08:57 2019 Time and Allocation Profiling Report (Final)
Main.exe +RTS -p -RTS
total time = 0.00 secs (0 ticks @ 1000 us, 1 processor)
total alloc = 83,504 bytes (excludes profiling overheads)
COST CENTRE MODULE SRC %time %alloc
writeAt Main Main.hs:(30,1)-(32,15) 0.0 6.3
showCells Main Main.hs:38:1-82 0.0 1.8
life Main Main.hs:(86,1)-(90,31) 0.0 2.3
goto Main Main.hs:35:1-68 0.0 11.2
clear Main Main.hs:27:1-24 0.0 12.5
CAF GHC.IO.Exception <entire-module> 0.0 2.3
CAF GHC.IO.Handle.FD <entire-module> 0.0 62.4
individual inherited
COST CENTRE MODULE SRC no. entries %time %alloc %time %alloc
MAIN MAIN <built-in> 109 0 0.0 0.8 0.0 100.0
CAF GHC.TopHandler <entire-module> 165 0 0.0 0.1 0.0 0.1
CAF GHC.IO.Handle.FD <entire-module> 145 0 0.0 62.4 0.0 62.4
CAF GHC.IO.Exception <entire-module> 143 0 0.0 2.3 0.0 2.3
CAF GHC.IO.Encoding.CodePage <entire-module> 136 0 0.0 0.2 0.0 0.2
CAF GHC.IO.Encoding <entire-module> 135 0 0.0 0.1 0.0 0.1
CAF Main <entire-module> 116 0 0.0 0.1 0.0 8.9
clear Main Main.hs:27:1-24 220 1 0.0 0.4 0.0 0.4
glider Main Main.hs:24:1-40 226 1 0.0 0.0 0.0 0.0
main Main Main.hs:(9,1)-(11,15) 218 1 0.0 0.1 0.0 8.4
life Main Main.hs:(86,1)-(90,31) 222 1 0.0 0.2 0.0 8.3
showCells Main Main.hs:38:1-82 225 1 0.0 1.8 0.0 8.1
writeAt Main Main.hs:(30,1)-(32,15) 228 5 0.0 0.7 0.0 6.3
goto Main Main.hs:35:1-68 230 5 0.0 5.6 0.0 5.6
main Main Main.hs:(9,1)-(11,15) 219 0 0.0 0.0 0.0 25.2
clear Main Main.hs:27:1-24 221 0 0.0 11.0 0.0 11.0
life Main Main.hs:(86,1)-(90,31) 223 0 0.0 2.0 0.0 14.2
clear Main Main.hs:27:1-24 224 0 0.0 1.1 0.0 1.1
showCells Main Main.hs:38:1-82 227 0 0.0 0.0 0.0 11.1
writeAt Main Main.hs:(30,1)-(32,15) 229 0 0.0 5.6 0.0 11.1
goto Main Main.hs:35:1-68 231 0 0.0 5.6 0.0 5.6
我想查找同一用户在租车前的最后搜索。
我想要这样的结果。
Id Type UserId Time
1 Rental 1 15:35
2 Search 2 15:34
3 Search 1 15:33
4 Search 1 15:32
谢谢
答案 0 :(得分:0)
这是使用case_when
和hm
的一个选项
library(dplyr)
library(lubridate)
df %>% group_by(UserId) %>%
mutate(T = time_length(hm(Time)), #We need time_length as hm returns a period object
`Search Id`= case_when(
n() >=2 & Type=='Rental' ~ Id[T==max(T[Type=='Search'])], #return the max where we have groups
#with more than two obs and Type equal to Rental
TRUE ~ NA_integer_
))
# A tibble: 4 x 6
# Groups: UserId [2]
Id Type UserId Time T `Search Id`
<int> <chr> <int> <chr> <dbl> <int>
1 1 Rental 1 15:35 56100 3
2 2 Search 2 15:34 56040 NA
3 3 Search 1 15:33 55980 NA
4 4 Search 1 15:32 55920 NA
但是,我认为将您的数据框分为两个数据框并进行正常的左或右连接(例如,
dfren <- filter(df,Type=='Rental')
dfser <- filter(df,Type=='Search') %>% group_by(UserId) %>%
filter(time_length(hm(Time))==max(time_length(hm(Time))))
dfren %>% left_join(dfser,by='UserId', suffix = c(".Ren", ".Ser"))
Id.Ren Type.Ren UserId Time.Ren Id.Ser Type.Ser Time.Ser
1 1 Rental 1 15:35 3 Search 15:33