我有一个如下的数据框。我想添加2列:
ContainsANZ:表示F0到F3中的任何列是否包含' Australia'或者'新西兰'忽略NA值
AllANZ:表示所有非NA列是否包含' Australia'或者'新西兰'
启动数据框将是:
dfContainsANZ
Col.A Col.B Col.C F0 F1 F2 F3
1 data 0 xxx Australia Singapore <NA> <NA>
2 data 1 yyy United States United States United States <NA>
3 data 0 zzz Australia Australia Australia Australia
4 data 0 ooo Hong Kong London Australia <NA>
5 data 1 xxx New Zealand <NA> <NA> <NA>
最终结果应如下所示:
df
Col.A Col.B Col.C F0 F1 F2 F3 ContainsANZ AllANZ
1 data 0 xxx Australia Singapore <NA> <NA> Australia undefined
2 data 1 yyy United States United States United States <NA> undefined undefined
3 data 0 zzz Australia Australia Australia Australia Australia Australia
4 data 0 ooo Hong Kong London Australia <NA> Australia undefined
5 data 1 xxx New Zealand <NA> <NA> <NA> New Zealand New Zealand
我正在使用dplyr(首选解决方案),并提出了这样的代码,它不起作用并且非常重复。是否有更好的方法来写这个,以便我不必再复制F0 | F1 | F2 ...规则?我的真实数据集更多。 NAs是否会干扰代码?
df <- df %>%
mutate(ANZFlag =
ifelse(
F0 == 'Australia' |
F1 == 'Australia' |
F2 == 'Australia' |
F3 == 'Australia',
'Australia',
ifelse(
F0 == 'New Zealand' |
F1 == 'New Zealand' |
F2 == 'New Zealand' |
F3 == 'New Zealand',
'New Zealand', 'undefined'
)
)
)
答案 0 :(得分:1)
还有一些打字,但我认为这是你正在寻找的本质:
library(dplyr)
df <- read.table(text='Col.A,Col.B,Col.C,F0,F1,F2,F3
data,0,xxx,Australia,Singapore,NA,NA
data,1,yyy,"United States","United States","United States",NA
data,0,zzz,Australia,Australia,Australia,Australia
data,0,ooo,"Hong Kong",London,Australia,NA
data,1,xxx,"New Zealand",NA,NA,NA', header=TRUE, sep=",", stringsAsFactors=FALSE)
down_under <- function(x) {
mtch <- c("Australia", "New Zealand")
cols <- unlist(x)[c("F0", "F1", "F2", "F3")]
bind_cols(x, data_frame(ContainsANZ=any(mtch %in% cols, na.rm=TRUE),
AllANZ=all(as.vector(na.omit(cols)) %in% cols)))
}
rowwise(df) %>% do(down_under(.))
## Source: local data frame [5 x 9]
## Groups: <by row>
##
## Col.A Col.B Col.C F0 F1 F2 F3 ContainsANZ AllANZ
## (chr) (int) (chr) (chr) (chr) (chr) (chr) (lgl) (lgl)
## 1 data 0 xxx Australia Singapore NA NA TRUE TRUE
## 2 data 1 yyy United States United States United States NA FALSE TRUE
## 3 data 0 zzz Australia Australia Australia Australia TRUE TRUE
## 4 data 0 ooo Hong Kong London Australia NA TRUE TRUE
## 5 data 1 xxx New Zealand NA NA NA TRUE TRUE