返回包含行和列的所有NA

时间:2015-10-19 22:37:29

标签: r

我正在尝试创建一些循环遍历数据集的代码,并将最后完成的行和列作为检查器返回。

数据集的第一列是学生的名字,其余数据集代表已完成程序的各个部分。包含数据的每一行的最后一列代表学生完成的最后一部分。

因此输出应该是以下的矢量:

 name, sections that are blank and contain NA (i.e. 1_1 or 1_3 etc.) 
 name, sections that are blank and contain NA 
 name, sections that are blank and contain NA etc.

以下是数据框:

df<-data.frame(list(names = c("John", "Peter", "Steve"),
                    '1_1' = c("yes", "yes","yes"),
                    '1_2' = c("yes", "yes", ""),
                    '1_3' = c("", "yes", "yes"),
                    '1_4' = c("", "","yes"),
                    '1_5' = c(NA, NA, NA)), 
               row.names = NULL)
df
#   names X1_1 X1_2 X1_3 X1_4 X1_5
# 1  John  yes  yes             NA
# 2 Peter  yes  yes  yes        NA
# 3 Steve  yes       yes  yes   NA

df<-apply(df, 2, function(x) gsub("^|^", NA, x))
#   names  X1_1 X1_2 X1_3 X1_4 X1_5
# [1,] NA    NA   NA   NA   NA   NA  
# [2,] NA    NA   NA   NA   NA   NA  
# [3,] NA    NA   NA   NA   NA   NA

我在gsub之后得到以下内容,这意味着将NA填入balnk空间:

names   X1_1  X1_2  X1_3  X1_4  X1_5
[1,] "John"  "yes" "yes" NA    NA    NA  
[2,] "Peter" "yes" "yes" "yes" NA    NA  
[3,] "Steve" "yes" NA    "yes" "yes" NA  

以下是我试图给我们的代码:

sapply(df,function(x)
which(x== if(df>2) 
{
list(df[,]) 
}
))

我想要的输出是(如上所述)带有以下内容的向量:

name, all fields that contain NA (i.e. 1_1, 1_2 etc.)
name, all fields that contain NA (i.e. 1_1, 1_2 etc.)
etc.

我得到的R输出是:

$John integer(0)

$Peter integer(0)

$Steve integer(0)

$yes integer(0)

$yes integer(0)

$yes integer(0)

$yes integer(0)

$yes integer(0)

$ integer(0)

$ integer(0)

$yes integer(0)

$yes integer(0)

$ integer(0)

$ integer(0)

$yes integer(0)

$ integer(0)

$ integer(0)

$ integer(0)

因此,根本不起作用。有什么指针吗?

再次感谢。

列维

1 个答案:

答案 0 :(得分:1)

这是一个data.table解决方案:

require(data.table)
df<-data.frame(list(names = c("John", "Peter", "Steve"),
                '1_1' = c("yes", "yes","yes"),
                '1_2' = c("yes", "yes", ""),
                '1_3' = c("", "yes", "yes"),
                '1_4' = c("", "","yes"),
                '1_5' = c(NA, NA, NA)), 
           row.names = NULL)
dt <- as.data.table(df)
# Instead of using gsub, have a function that sets values True
# if the cell value != 'yes' or is NA.
dt.i <- dt[, lapply(.SD, function(x) x != 'yes' | is.na(x)), by=names]
# See dt.i:
#    names  X1_1  X1_2  X1_3  X1_4 X1_5
# 1:  John FALSE FALSE  TRUE  TRUE TRUE
# 2: Peter FALSE FALSE FALSE  TRUE TRUE
# 3: Steve FALSE  TRUE FALSE FALSE TRUE
dt.i[, list(list(names(.SD)[which(.SD == T)])), by=names]

产量

   names             V1
1:  John X1_3,X1_4,X1_5
2: Peter      X1_4,X1_5
3: Steve      X1_2,X1_5

如果您将此结构存储为

dt.final <- dt.i[, list(list(names(.SD)[which(.SD == T)])), by=names]

您可以通过例如

访问信息
dt.final[names == 'John']
#    names             V1
# 1:  John X1_3,X1_4,X1_5
dt.final[names == 'John']$V1
# [[1]]
# [1] "X1_3" "X1_4" "X1_5"