我正在尝试创建一些循环遍历数据集的代码,并将最后完成的行和列作为检查器返回。
数据集的第一列是学生的名字,其余数据集代表已完成程序的各个部分。包含数据的每一行的最后一列代表学生完成的最后一部分。
因此输出应该是以下的矢量:
name, sections that are blank and contain NA (i.e. 1_1 or 1_3 etc.)
name, sections that are blank and contain NA
name, sections that are blank and contain NA etc.
以下是数据框:
df<-data.frame(list(names = c("John", "Peter", "Steve"),
'1_1' = c("yes", "yes","yes"),
'1_2' = c("yes", "yes", ""),
'1_3' = c("", "yes", "yes"),
'1_4' = c("", "","yes"),
'1_5' = c(NA, NA, NA)),
row.names = NULL)
df
# names X1_1 X1_2 X1_3 X1_4 X1_5
# 1 John yes yes NA
# 2 Peter yes yes yes NA
# 3 Steve yes yes yes NA
df<-apply(df, 2, function(x) gsub("^|^", NA, x))
# names X1_1 X1_2 X1_3 X1_4 X1_5
# [1,] NA NA NA NA NA NA
# [2,] NA NA NA NA NA NA
# [3,] NA NA NA NA NA NA
我在gsub之后得到以下内容,这意味着将NA填入balnk空间:
names X1_1 X1_2 X1_3 X1_4 X1_5
[1,] "John" "yes" "yes" NA NA NA
[2,] "Peter" "yes" "yes" "yes" NA NA
[3,] "Steve" "yes" NA "yes" "yes" NA
以下是我试图给我们的代码:
sapply(df,function(x)
which(x== if(df>2)
{
list(df[,])
}
))
我想要的输出是(如上所述)带有以下内容的向量:
name, all fields that contain NA (i.e. 1_1, 1_2 etc.)
name, all fields that contain NA (i.e. 1_1, 1_2 etc.)
etc.
我得到的R输出是:
$John integer(0)
$Peter integer(0)
$Steve integer(0)
$yes integer(0)
$yes integer(0)
$yes integer(0)
$yes integer(0)
$yes integer(0)
$ integer(0)
$ integer(0)
$yes integer(0)
$yes integer(0)
$ integer(0)
$ integer(0)
$yes integer(0)
$ integer(0)
$ integer(0)
$ integer(0)
因此,根本不起作用。有什么指针吗?
再次感谢。
列维
答案 0 :(得分:1)
这是一个data.table
解决方案:
require(data.table)
df<-data.frame(list(names = c("John", "Peter", "Steve"),
'1_1' = c("yes", "yes","yes"),
'1_2' = c("yes", "yes", ""),
'1_3' = c("", "yes", "yes"),
'1_4' = c("", "","yes"),
'1_5' = c(NA, NA, NA)),
row.names = NULL)
dt <- as.data.table(df)
# Instead of using gsub, have a function that sets values True
# if the cell value != 'yes' or is NA.
dt.i <- dt[, lapply(.SD, function(x) x != 'yes' | is.na(x)), by=names]
# See dt.i:
# names X1_1 X1_2 X1_3 X1_4 X1_5
# 1: John FALSE FALSE TRUE TRUE TRUE
# 2: Peter FALSE FALSE FALSE TRUE TRUE
# 3: Steve FALSE TRUE FALSE FALSE TRUE
dt.i[, list(list(names(.SD)[which(.SD == T)])), by=names]
产量
names V1
1: John X1_3,X1_4,X1_5
2: Peter X1_4,X1_5
3: Steve X1_2,X1_5
如果您将此结构存储为
dt.final <- dt.i[, list(list(names(.SD)[which(.SD == T)])), by=names]
您可以通过例如
访问信息dt.final[names == 'John']
# names V1
# 1: John X1_3,X1_4,X1_5
dt.final[names == 'John']$V1
# [[1]]
# [1] "X1_3" "X1_4" "X1_5"