R列比较和过滤

时间:2015-01-14 19:54:22

标签: r filter

我有一个类似于此的数据框,列名为日期;

2013_11 | 2013_12 | 2014_01 | 2014_02 | 2014_03 |

 NA | NA | 3  | 3  | N  |
  2 | 2  | 3  | NA | NA |
 NA | NA | NA | NA | NA |

我需要编写某种逻辑函数来过滤掉我正在寻找的行。我需要在2013年(前两列)中仅提取任何月份没有数字的行,但DID在2014年的任何一列中都至少有一个数字。

所以代码只会为我拉回第一行;

NA | NA | 3  | 3  | N  |

我无法找到最有效的方法,因为我有大约800万行。

3 个答案:

答案 0 :(得分:2)

你可以尝试

indx1 <- grep('2013', colnames(df))
indx2 <- grep('2014', colnames(df))
df[!rowSums(!is.na(df[indx1]))&!!rowSums(!is.na(df[indx2])),]
#   2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

或者你可以使用

i1 <- Reduce(`&`, lapply(df[indx1], function(x) is.na(x)))
i2 <- Reduce(`&`, lapply(df[indx2], function(x) !is.na(x)))
df[i1 &i2,]
# 2013_11 2013_12 2014_01 2014_02 2014_03
#1      NA      NA       3       3       N

数据

df <- structure(list(`2013_11` = c(NA, 2L, NA), `2013_12` = c(NA, 2L, 
NA), `2014_01` = c(3L, 3L, NA), `2014_02` = c(3L, NA, NA), `2014_03` = c("N", 
NA, NA)), .Names = c("2013_11", "2013_12", "2014_01", "2014_02", 
"2014_03"), class = "data.frame", row.names = c(NA, -3L))

答案 1 :(得分:0)

你考虑过使用grep吗?我会创建一个函数来执行此操作,如下所示。在any循环中使用R&#39 allis.naiffor语句。

grep_function <- function(src, condition1, condition2) {
    for(i in 1:length(src[[1]])){
        data_condition1 <- src[i, grepl(condition1, names(src))]
        data_condition2 <- src[i, grepl(condition2, names(src))]
        if(all(is.na(data_condition1) && any(!is.na(data_condition2)))) {
            // do something here to each individual observation
        } else {
            // do something for those that do not meet your criterea
        }
    }
}

示例:grep_function(your-data-here, "2013", "2014")

答案 2 :(得分:0)

或者您可以使用SQL(它有点冗长,但对某些人来说可能更具可读性):

require('sqldf')

a=data.frame("2013_11"=c(NA,2,NA), "2013_12"=c(NA,2,NA), "2014_01" =c(3,3,NA),
             "2014_02" =c(3,NA,NA) ,"2014_03" =c(NA,NA,NA))

sqldf("select * from a where 
        case when X2013_11 is null then 0 else 1 end +
        case when X2013_12 is null then 0 else 1 end = 0 
        and
        case when X2014_01 is null then 0 else 1 end +
        case when X2014_02 is null then 0 else 1 end +
        case when X2014_03 is null then 0 else 1 end > 0
      ")

 X2013_11 X2013_12 X2014_01 X2014_02 X2014_03
       NA       NA        3        3       NA