Question

假设以下数据只是我正在使用的非常大的数据的一部分。

mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin'))

Date           Expense      Details        Vehicle    Person
1 2015-01-01    1566        item101 xsda   Car        John
2 2015-01-10    5646        fuel asa       Bike       Smith
3 2015-01-27    3456        item102a       Car        Robin
4 2015-02-27    6546        fuel asa       Car        <NA>
5 2015-03-15    5313        fuel sda       Bike       <NA>
6 2015-04-17    6466        fuel           Bike       <NA>
7 2015-04-18    5456        item102a       Bike       Robin

有两点需要考虑

1）当使用车辆“汽车”并购买“燃料”时，该人是约翰

2）当使用车辆“自行车”并购买“燃料”时，该人就是史密斯

我想要的输出是

     Date       Expense  Details        Vehicle    Person
 1 2015-01-01    1566    item101 xsda     Car      John
 2 2015-01-10    5646    fuel             Bike     Smith
 3 2015-01-27    3456    item102a         Car      Robin
 4 2015-02-27    6546    fuel             Car      John
 5 2015-03-15    5313    fuel sda         Bike     Smith
 6 2015-04-17    6466    fuel             Bike     Smith
 7 2015-04-18    5456    item102a         Bike     Robin

请告诉我如何解决这个问题？我使用了以下步骤并达到了解决方案的一半

mydf$Details<-as.character(mydf$Details)
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel'

是myDF

    Date     Expense      Details        Vehicle    Person
1 2015-01-01    1566      item101 xsda   Car        John
2 2015-01-10    5646      Fuel           Bike       Smith
3 2015-01-27    3456      item102a       Car        Robin
4 2015-02-27    6546      Fuel           Car        <NA>
5 2015-03-15    5313      Fuel           Bike       <NA>
6 2015-04-17    6466      Fuel           Bike       <NA>
7 2015-04-18    5456      item102a       Bike       Robin

注意：如果可能请避免循环。如果有更好更快的方法，请分享

Answer 1

你说的话就在那里试试这两行：

mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car']  <- 'John'
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike']  <- 'Smith'

Answer 2

您可以使用data.table：

在几行中完成

library(data.table)

setDT(mydf)

mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"]
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"]

mydf
#>          Date Expense      Details Vehicle Person
#> 1: 2015-01-01    1566 item101 xsda     Car   John
#> 2: 2015-01-10    5646     fuel asa    Bike  Smith
#> 3: 2015-01-27    3456     item102a     Car  Robin
#> 4: 2015-02-27    6546     fuel asa     Car   John
#> 5: 2015-03-15    5313     fuel sda    Bike  Smith
#> 6: 2015-04-17    6466         fuel    Bike  Smith
#> 7: 2015-04-18    5456     item102a    Bike  Robin

使用dplyr，您也可以进行条件变异，但代码更长。我使用stringr包进行字符串操作

library(dplyr)
library(stringr)
mydf %>%
  mutate(
    Person = ifelse(
      is.na(Person) & 
        str_detect(Details, "fuel") & 
        Vehicle == "Car", 
      "John",
      ifelse(
        is.na(Person) & 
          str_detect(Details, "fuel") & 
          Vehicle == "Bike", 
        "Smith", 
        as.character(Person)))
  )
#>         Date Expense      Details Vehicle Person
#> 1 2015-01-01    1566 item101 xsda     Car   John
#> 2 2015-01-10    5646     fuel asa    Bike  Smith
#> 3 2015-01-27    3456     item102a     Car  Robin
#> 4 2015-02-27    6546     fuel asa     Car   John
#> 5 2015-03-15    5313     fuel sda    Bike  Smith
#> 6 2015-04-17    6466         fuel    Bike  Smith
#> 7 2015-04-18    5456     item102a    Bike  Robin

仅当数据框列中的值与其他两个列值匹配时，才替换它

2 个答案: