Question

我有2种不同的数据（出租和搜索）。我想根据用户和时间将租金与搜索进行匹配。

示例：

    Mon Apr 29 08:57 2019 Time and Allocation Profiling Report  (Final)

       Main.exe +RTS -p -RTS

    total time  =        0.00 secs   (0 ticks @ 1000 us, 1 processor)
    total alloc =      83,504 bytes  (excludes profiling overheads)

COST CENTRE MODULE           SRC                     %time %alloc

writeAt     Main             Main.hs:(30,1)-(32,15)    0.0    6.3
showCells   Main             Main.hs:38:1-82           0.0    1.8
life        Main             Main.hs:(86,1)-(90,31)    0.0    2.3
goto        Main             Main.hs:35:1-68           0.0   11.2
clear       Main             Main.hs:27:1-24           0.0   12.5
CAF         GHC.IO.Exception <entire-module>           0.0    2.3
CAF         GHC.IO.Handle.FD <entire-module>           0.0   62.4


                                                                                 individual      inherited
COST CENTRE   MODULE                   SRC                    no.     entries  %time %alloc   %time %alloc

MAIN          MAIN                     <built-in>             109          0    0.0    0.8     0.0  100.0
 CAF          GHC.TopHandler           <entire-module>        165          0    0.0    0.1     0.0    0.1
 CAF          GHC.IO.Handle.FD         <entire-module>        145          0    0.0   62.4     0.0   62.4
 CAF          GHC.IO.Exception         <entire-module>        143          0    0.0    2.3     0.0    2.3
 CAF          GHC.IO.Encoding.CodePage <entire-module>        136          0    0.0    0.2     0.0    0.2
 CAF          GHC.IO.Encoding          <entire-module>        135          0    0.0    0.1     0.0    0.1
 CAF          Main                     <entire-module>        116          0    0.0    0.1     0.0    8.9
  clear       Main                     Main.hs:27:1-24        220          1    0.0    0.4     0.0    0.4
  glider      Main                     Main.hs:24:1-40        226          1    0.0    0.0     0.0    0.0
  main        Main                     Main.hs:(9,1)-(11,15)  218          1    0.0    0.1     0.0    8.4
   life       Main                     Main.hs:(86,1)-(90,31) 222          1    0.0    0.2     0.0    8.3
    showCells Main                     Main.hs:38:1-82        225          1    0.0    1.8     0.0    8.1
     writeAt  Main                     Main.hs:(30,1)-(32,15) 228          5    0.0    0.7     0.0    6.3
      goto    Main                     Main.hs:35:1-68        230          5    0.0    5.6     0.0    5.6
 main         Main                     Main.hs:(9,1)-(11,15)  219          0    0.0    0.0     0.0   25.2
  clear       Main                     Main.hs:27:1-24        221          0    0.0   11.0     0.0   11.0
  life        Main                     Main.hs:(86,1)-(90,31) 223          0    0.0    2.0     0.0   14.2
   clear      Main                     Main.hs:27:1-24        224          0    0.0    1.1     0.0    1.1
   showCells  Main                     Main.hs:38:1-82        227          0    0.0    0.0     0.0   11.1
    writeAt   Main                     Main.hs:(30,1)-(32,15) 229          0    0.0    5.6     0.0   11.1
     goto     Main                     Main.hs:35:1-68        231          0    0.0    5.6     0.0    5.6

我想查找同一用户在租车前的最后搜索。

我想要这样的结果。

Id  Type    UserId  Time
1   Rental  1       15:35
2   Search  2       15:34
3   Search  1       15:33
4   Search  1       15:32

谢谢

Answer 1

这是使用case_when和hm的一个选项

 library(dplyr)
 library(lubridate)
 df %>% group_by(UserId) %>% 
  mutate(T = time_length(hm(Time)),   #We need time_length as hm returns a period object
         `Search Id`= case_when(
    n() >=2 & Type=='Rental' ~ Id[T==max(T[Type=='Search'])], #return the max where we have groups 
                                                  #with more than two obs and Type equal to Rental
    TRUE ~ NA_integer_
    ))


# A tibble: 4 x 6
# Groups:   UserId [2]
     Id Type   UserId Time      T `Search Id`
  <int> <chr>   <int> <chr> <dbl>       <int>
1     1 Rental      1 15:35 56100           3
2     2 Search      2 15:34 56040          NA
3     3 Search      1 15:33 55980          NA
4     4 Search      1 15:32 55920          NA

但是，我认为将您的数据框分为两个数据框并进行正常的左或右连接（例如，

dfren <- filter(df,Type=='Rental')
dfser <- filter(df,Type=='Search') %>% group_by(UserId) %>% 
       filter(time_length(hm(Time))==max(time_length(hm(Time)))) 

dfren %>% left_join(dfser,by='UserId', suffix = c(".Ren", ".Ser"))

  Id.Ren Type.Ren UserId Time.Ren Id.Ser Type.Ser Time.Ser
1      1   Rental      1    15:35      3   Search    15:33

根据同一用户和最新日期匹配行

1 个答案: