如果列范围包含R中的字符串,请添加新列

时间:2016-04-19 01:37:02

标签: r

我有一个如下的数据框。我想添加2列:

ContainsANZ:表示F0到F3中的任何列是否包含' Australia'或者'新西兰'忽略NA值

AllANZ:表示所有非NA列是否包含' Australia'或者'新西兰'

启动数据框将是:

dfContainsANZ
  Col.A Col.B Col.C            F0            F1            F2        F3
1  data     0   xxx     Australia     Singapore          <NA>      <NA>
2  data     1   yyy United States United States United States      <NA>
3  data     0   zzz     Australia     Australia     Australia Australia
4  data     0   ooo     Hong Kong        London     Australia      <NA>
5  data     1   xxx   New Zealand          <NA>          <NA>      <NA>

最终结果应如下所示:

df
  Col.A Col.B Col.C            F0            F1            F2        F3 ContainsANZ      AllANZ
1  data     0   xxx     Australia     Singapore          <NA>      <NA>   Australia   undefined
2  data     1   yyy United States United States United States      <NA>   undefined   undefined
3  data     0   zzz     Australia     Australia     Australia Australia   Australia   Australia
4  data     0   ooo     Hong Kong        London     Australia      <NA>   Australia   undefined
5  data     1   xxx   New Zealand          <NA>          <NA>      <NA> New Zealand New Zealand

我正在使用dplyr(首选解决方案),并提出了这样的代码,它不起作用并且非常重复。是否有更好的方法来写这个,以便我不必再复制F0 | F1 | F2 ...规则?我的真实数据集更多。 NAs是否会干扰代码?

df <- df %>%
mutate(ANZFlag = 
    ifelse(
    F0 == 'Australia' | 
    F1 == 'Australia' |
    F2 == 'Australia' | 
    F3 == 'Australia',
    'Australia', 
        ifelse(
        F0 == 'New Zealand' | 
        F1 == 'New Zealand' |
        F2 == 'New Zealand' | 
        F3 == 'New Zealand',
        'New Zealand', 'undefined'
        )
    )
)

1 个答案:

答案 0 :(得分:1)

还有一些打字,但我认为这是你正在寻找的本质:

library(dplyr)

df <- read.table(text='Col.A,Col.B,Col.C,F0,F1,F2,F3
data,0,xxx,Australia,Singapore,NA,NA
data,1,yyy,"United States","United States","United States",NA
data,0,zzz,Australia,Australia,Australia,Australia
data,0,ooo,"Hong Kong",London,Australia,NA
data,1,xxx,"New Zealand",NA,NA,NA', header=TRUE, sep=",", stringsAsFactors=FALSE)

down_under <- function(x) {
  mtch <- c("Australia", "New Zealand")
  cols <- unlist(x)[c("F0", "F1", "F2", "F3")]
  bind_cols(x, data_frame(ContainsANZ=any(mtch %in% cols, na.rm=TRUE),
                          AllANZ=all(as.vector(na.omit(cols)) %in% cols)))
}

rowwise(df) %>% do(down_under(.))

## Source: local data frame [5 x 9]
## Groups: <by row>
## 
##   Col.A Col.B Col.C            F0            F1            F2        F3 ContainsANZ AllANZ
##   (chr) (int) (chr)         (chr)         (chr)         (chr)     (chr)       (lgl)  (lgl)
## 1  data     0   xxx     Australia     Singapore            NA        NA        TRUE   TRUE
## 2  data     1   yyy United States United States United States        NA       FALSE   TRUE
## 3  data     0   zzz     Australia     Australia     Australia Australia        TRUE   TRUE
## 4  data     0   ooo     Hong Kong        London     Australia        NA        TRUE   TRUE
## 5  data     1   xxx   New Zealand            NA            NA        NA        TRUE   TRUE