我现在正在R中进行贝叶斯知识跟踪,我的部分代码需要消除在给定KC上少于3个实例的学生,否则参数估计将不会收敛。所以要做到这一点,我目前有:
by_user = split(one_kc, one_kc$Anon.Student.id)
obs_by_user = sapply(by_user, nrow)
valid_users = names(obs_by_user[obs_by_user > 2])
student_outcomes = one_kc[one_kc$Anon.Student.id %in% valid_users,]
但出于某种原因,当我在我的环境中查看by_user时,仍然会在那里列出无效用户,如果我尝试运行曲线拟合,则值不会收敛,我相信这就是原因。我哪里错了?
编辑:这里有更多我目前使用的代码:
df <- data.frame(read.table(file=file.choose(),na.strings="NA",sep="\t",quote="",header=TRUE, fill=TRUE))
df_subset <- df[,c(5,21,27,39,38)]
df_subset$Accuracy <- as.numeric(as.vector(df_subset$Accuracy))
df_subset <- na.omit(df_subset)
kc_list <- unique(df_subset$KC.Model.2A.)
#loop on the kc_list
for (kc in kc_list)
{
print(kc)
one_kc <- df_subset[ which(df_subset$KC.Model.2A.==kc), ]
one_kc <- one_kc[,c(1,3)]
# remove users with few observations on this skill
by_user = split(one_kc, one_kc$Anon.Student.id)
obs_by_user = sapply(by_user, nrow)
valid_users = names(obs_by_user[obs_by_user > 2])
student_outcomes = one_kc[one_kc$Anon.Student.id %in% valid_users,]
by_good_user = split(student_outcomes$Accuracy, student_outcomes$Anon.Student.id)
}
答案 0 :(得分:1)
如果您需要加速代码,您还可以查看data.table包:
library(data.table)
new_kc_dt <- as.data.table(new_kc)
new_kc_dt[, instances := .N, by = Anon.Student.id][instances >= 3]
# which is the same as
new_kc_dt[, instances := .N, by = Anon.Student.id]
new_kc_dt[instances >= 3]
答案 1 :(得分:0)
library(dplyr)
new_kc <- one_kc %>%
group_by(Anon.Student.id) %>%
mutate(instances = n()) %>%
filter(instances >= 3)