根据列名称从应用功能中排除列

时间:2019-07-25 13:31:39

标签: r

我有一些数据:

df <- data.frame(v1 = c('word',NA,'word','word',NA,'word','word',NA,'word','word'), 
                 v1_open = c('word',NA,'word','word',NA,'word','word',NA,'word','word'),
                 v2 = c('word','word',NA,'word','word',NA,'word','word',NA,'word'), 
                 v2_open = c('word','word',NA,'word','word',NA,'word','word',NA,'word'))

我正在使用apply将包含NA的观察值更改为包含1的其他观察值。

df <- t(apply(df,1,function(x){
  ifelse(is.na(x) ,0,1)
}))

返回

      v1 v1_open v2 v2_open
 [1,]  1       1  1       1
 [2,]  0       0  1       1
 [3,]  1       1  0       0
 [4,]  1       1  1       1
 [5,]  0       0  1       1
 [6,]  1       1  0       0
 [7,]  1       1  1       1
 [8,]  0       0  1       1
 [9,]  1       1  0       0
[10,]  1       1  1       1

我想修改apply函数以排除名称中包含文本'_open'的列,从而导致:

      v1 v1_open v2 v2_open
 [1,]  1    word  1    word  
 [2,]  0    NA    1    word  
 [3,]  1    word  0    NA    
 [4,]  1    word  1    word  
 [5,]  0    NA    1    word  
 [6,]  1    word  0    NA    
 [7,]  1    word  1    word  
 [8,]  0    NA    1    word  
 [9,]  1    word  0    NA    
[10,]  1    word  1    word  

这怎么办?

3 个答案:

答案 0 :(得分:3)

可以做到:

library(dplyr)

df %>%
  mutate_at(vars(-contains("_open")),
            ~ +(!is.na(.)))

输出:

   v1 v1_open v2 v2_open
1   1    word  1    word
2   0    <NA>  1    word
3   1    word  0    <NA>
4   1    word  1    word
5   0    <NA>  1    word
6   1    word  0    <NA>
7   1    word  1    word
8   0    <NA>  1    word
9   1    word  0    <NA>
10  1    word  1    word

答案 1 :(得分:1)

我们可以将is.na直接应用于data.frame列的子集,而无需进行任何循环,然后更新列

nm1 <- grep("_open", names(df), value = TRUE, invert = TRUE)
df[nm1] <- +(!is.na(df[nm1]))
df
#   v1 v1_open v2 v2_open
#1   1    word  1    word
#2   0    <NA>  1    word
#3   1    word  0    <NA>
#4   1    word  1    word
#5   0    <NA>  1    word
#6   1    word  0    <NA>
#7   1    word  1    word
#8   0    <NA>  1    word
#9   1    word  0    <NA>
#10  1    word  1    word

答案 2 :(得分:0)

如果您的列在.*.*_open之间交替,那么您可以简单地通过TRUE, FALSE将列子集化,即

df[c(TRUE, FALSE)] <- +(!is.na(df[c(TRUE, FALSE)]))

df
#   v1 v1_open v2 v2_open
#1   1    word  1    word
#2   0    <NA>  1    word
#3   1    word  0    <NA>
#4   1    word  1    word
#5   0    <NA>  1    word
#6   1    word  0    <NA>
#7   1    word  1    word
#8   0    <NA>  1    word
#9   1    word  0    <NA>
#10  1    word  1    word