我有一个类似于此的数据框,列名为日期;
2013_11 | 2013_12 | 2014_01 | 2014_02 | 2014_03 |
NA | NA | 3 | 3 | N |
2 | 2 | 3 | NA | NA |
NA | NA | NA | NA | NA |
我需要编写某种逻辑函数来过滤掉我正在寻找的行。我需要在2013年(前两列)中仅提取任何月份没有数字的行,但DID在2014年的任何一列中都至少有一个数字。
所以代码只会为我拉回第一行;
NA | NA | 3 | 3 | N |
我无法找到最有效的方法,因为我有大约800万行。
答案 0 :(得分:2)
你可以尝试
indx1 <- grep('2013', colnames(df))
indx2 <- grep('2014', colnames(df))
df[!rowSums(!is.na(df[indx1]))&!!rowSums(!is.na(df[indx2])),]
# 2013_11 2013_12 2014_01 2014_02 2014_03
#1 NA NA 3 3 N
或者你可以使用
i1 <- Reduce(`&`, lapply(df[indx1], function(x) is.na(x)))
i2 <- Reduce(`&`, lapply(df[indx2], function(x) !is.na(x)))
df[i1 &i2,]
# 2013_11 2013_12 2014_01 2014_02 2014_03
#1 NA NA 3 3 N
df <- structure(list(`2013_11` = c(NA, 2L, NA), `2013_12` = c(NA, 2L,
NA), `2014_01` = c(3L, 3L, NA), `2014_02` = c(3L, NA, NA), `2014_03` = c("N",
NA, NA)), .Names = c("2013_11", "2013_12", "2014_01", "2014_02",
"2014_03"), class = "data.frame", row.names = c(NA, -3L))
答案 1 :(得分:0)
你考虑过使用grep吗?我会创建一个函数来执行此操作,如下所示。在any
循环中使用R&#39 all
,is.na
,if
和for
语句。
grep_function <- function(src, condition1, condition2) {
for(i in 1:length(src[[1]])){
data_condition1 <- src[i, grepl(condition1, names(src))]
data_condition2 <- src[i, grepl(condition2, names(src))]
if(all(is.na(data_condition1) && any(!is.na(data_condition2)))) {
// do something here to each individual observation
} else {
// do something for those that do not meet your criterea
}
}
}
示例:grep_function(your-data-here, "2013", "2014")
答案 2 :(得分:0)
或者您可以使用SQL(它有点冗长,但对某些人来说可能更具可读性):
require('sqldf')
a=data.frame("2013_11"=c(NA,2,NA), "2013_12"=c(NA,2,NA), "2014_01" =c(3,3,NA),
"2014_02" =c(3,NA,NA) ,"2014_03" =c(NA,NA,NA))
sqldf("select * from a where
case when X2013_11 is null then 0 else 1 end +
case when X2013_12 is null then 0 else 1 end = 0
and
case when X2014_01 is null then 0 else 1 end +
case when X2014_02 is null then 0 else 1 end +
case when X2014_03 is null then 0 else 1 end > 0
")
X2013_11 X2013_12 X2014_01 X2014_02 X2014_03
NA NA 3 3 NA