过滤R

时间:2015-07-28 15:24:00

标签: r list filtering curve-fitting bayesian-networks

我现在正在R中进行贝叶斯知识跟踪,我的部分代码需要消除在给定KC上少于3个实例的学生,否则参数估计将不会收敛。所以要做到这一点,我目前有:

by_user = split(one_kc, one_kc$Anon.Student.id)
obs_by_user = sapply(by_user, nrow)
valid_users = names(obs_by_user[obs_by_user > 2])
student_outcomes = one_kc[one_kc$Anon.Student.id %in% valid_users,]

但出于某种原因,当我在我的环境中查看by_user时,仍然会在那里列出无效用户,如果我尝试运行曲线拟合,则值不会收敛,我相信这就是原因。我哪里错了?

编辑:这里有更多我目前使用的代码:

df <- data.frame(read.table(file=file.choose(),na.strings="NA",sep="\t",quote="",header=TRUE, fill=TRUE))


df_subset <- df[,c(5,21,27,39,38)]

df_subset$Accuracy <- as.numeric(as.vector(df_subset$Accuracy))

df_subset <- na.omit(df_subset)

kc_list <- unique(df_subset$KC.Model.2A.)
#loop on the kc_list
for (kc in kc_list)
  {
  print(kc)
  one_kc <- df_subset[ which(df_subset$KC.Model.2A.==kc), ]
  one_kc <- one_kc[,c(1,3)]
  # remove users with few observations on this skill
  by_user = split(one_kc, one_kc$Anon.Student.id)
  obs_by_user = sapply(by_user, nrow)
  valid_users = names(obs_by_user[obs_by_user > 2])
  student_outcomes = one_kc[one_kc$Anon.Student.id %in% valid_users,]

  by_good_user = split(student_outcomes$Accuracy, student_outcomes$Anon.Student.id)
}

2 个答案:

答案 0 :(得分:1)

如果您需要加速代码,您还可以查看data.table包:

library(data.table)
new_kc_dt <- as.data.table(new_kc)

new_kc_dt[, instances := .N, by = Anon.Student.id][instances >= 3]

# which is the same as 
new_kc_dt[, instances := .N, by = Anon.Student.id]
new_kc_dt[instances >= 3]

答案 1 :(得分:0)

library(dplyr)
new_kc <- one_kc %>%
    group_by(Anon.Student.id) %>%
    mutate(instances = n()) %>%
    filter(instances >= 3)