假设以下数据只是我正在使用的非常大的数据的一部分。
mydf<-data.frame(Date=as.Date(c("2015-01-01","2015-01-10","2015-01-27","2015-02-27","2015-03-15","2015-04-17","2015-04-18")),Expense=c(1566,5646,3456,6546,5313,6466,5456),Details=c('item101 xsda','fuel asa','item102a','fuel asa','fuel sda','fuel','item102a'),Vehicle=c('Car','Bike','Car','Car','Bike','Bike','Bike'),Person=c('John','Smith','Robin',rep(NA,3),'Robin'))
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 fuel asa Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 fuel asa Car <NA>
5 2015-03-15 5313 fuel sda Bike <NA>
6 2015-04-17 6466 fuel Bike <NA>
7 2015-04-18 5456 item102a Bike Robin
有两点需要考虑
1)当使用车辆“汽车”并购买“燃料”时,该人是约翰
2)当使用车辆“自行车”并购买“燃料”时,该人就是史密斯
我想要的输出是
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 fuel Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 fuel Car John
5 2015-03-15 5313 fuel sda Bike Smith
6 2015-04-17 6466 fuel Bike Smith
7 2015-04-18 5456 item102a Bike Robin
请告诉我如何解决这个问题? 我使用了以下步骤并达到了解决方案的一半
mydf$Details<-as.character(mydf$Details)
mydf$Details[grepl('fuel',mydf$Details,ignore.case=TRUE)]<-'Fuel'
是myDF
Date Expense Details Vehicle Person
1 2015-01-01 1566 item101 xsda Car John
2 2015-01-10 5646 Fuel Bike Smith
3 2015-01-27 3456 item102a Car Robin
4 2015-02-27 6546 Fuel Car <NA>
5 2015-03-15 5313 Fuel Bike <NA>
6 2015-04-17 6466 Fuel Bike <NA>
7 2015-04-18 5456 item102a Bike Robin
注意:如果可能请避免循环。 如果有更好更快的方法,请分享
答案 0 :(得分:1)
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Car'] <- 'John'
mydf$Person[mydf$Details=='Fuel' & mydf$Vehicle=='Bike'] <- 'Smith'
答案 1 :(得分:1)
您可以使用data.table
:
library(data.table)
setDT(mydf)
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Car", Person := "John"]
mydf[is.na(Person) & Details %like% "fuel" & Vehicle == "Bike", Person := "Smith"]
mydf
#> Date Expense Details Vehicle Person
#> 1: 2015-01-01 1566 item101 xsda Car John
#> 2: 2015-01-10 5646 fuel asa Bike Smith
#> 3: 2015-01-27 3456 item102a Car Robin
#> 4: 2015-02-27 6546 fuel asa Car John
#> 5: 2015-03-15 5313 fuel sda Bike Smith
#> 6: 2015-04-17 6466 fuel Bike Smith
#> 7: 2015-04-18 5456 item102a Bike Robin
使用dplyr
,您也可以进行条件变异,但代码更长。我使用stringr
包进行字符串操作
library(dplyr)
library(stringr)
mydf %>%
mutate(
Person = ifelse(
is.na(Person) &
str_detect(Details, "fuel") &
Vehicle == "Car",
"John",
ifelse(
is.na(Person) &
str_detect(Details, "fuel") &
Vehicle == "Bike",
"Smith",
as.character(Person)))
)
#> Date Expense Details Vehicle Person
#> 1 2015-01-01 1566 item101 xsda Car John
#> 2 2015-01-10 5646 fuel asa Bike Smith
#> 3 2015-01-27 3456 item102a Car Robin
#> 4 2015-02-27 6546 fuel asa Car John
#> 5 2015-03-15 5313 fuel sda Bike Smith
#> 6 2015-04-17 6466 fuel Bike Smith
#> 7 2015-04-18 5456 item102a Bike Robin