我有20名工人,每人完成100项任务。我已经为每项任务生成了真实的答案,这是5个答案中的1个
answers <- c("liver", "blood", "lung", "brain", "heart")
truth <- sample(answers, no.tasks, replace = TRUE, prob = c(0.2, 0.2, 0.2, 0.2, 0.2))
我的dataSet包含列workerID,taskID,truth。现在我需要生成另一个向量,我模拟工作人员将根据某个概率回答的内容。例如,如果我的任务1的真相,工人1是&#34;肝脏&#34;,我希望工人1回答&#34;肝脏&#34;对于任务1的概率很高。同样,对于所有2000个任务的五个答案中的每一个,我都希望工人得到答案。为此,我使用以下for和if循环。
for (i in nrow(dataSet)){
if (dataSet$truth[i] == "liver")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
df <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
} else {
df <- (rep(sample(answers, no.tasks, prob = c(0.2, 0.2, 0.2, 0.2, 0.2), no.workers)))
}
}
但是,由于我的任务1的真相是大脑,输出向量df有很多答案,它们是大脑&#34;。有人可以暗示这里出了什么问题吗?
答案 0 :(得分:1)
考虑使用包含1,000个元素的基础字符向量的列表进行初始化。
df <- vector("list", 2000)
for (i in 1:nrow(dataSet)){
if (dataSet$truth[i] == "liver")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
df[[i]] <-(rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
}
}
或者,您可以使用lapply()
输出与输入相同长度的列表向量(即 dataSet 的行),不需要初始化:
df2 <- lapply(seq_len(nrow(dataSet)), function(i){
if (dataSet$truth[i] == "liver")
{
temp <- (rep(sample(answers, no.tasks, prob = c(0.9, 0.02, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "blood")
{
temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.9, 0.02, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "lung")
{
temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.9, 0.02, 0.02), no.workers)))
} else if (dataSet$truth[i] == "brain")
{
temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.9, 0.02), no.workers)))
} else if (dataSet$truth[i] == "heart")
{
temp <- (rep(sample(answers, no.tasks, prob = c(0.02, 0.02, 0.02, 0.02, 0.9), no.workers)))
}
return(temp)
})
更好的是,您可以通过匹配 answers 向量中的当前if
来修剪嵌套的dataSet$truth
语句,然后用<替换概率向量中的相应索引EM> 0.9 :
df3 <- lapply(seq_len(nrow(dataSet)), function(i){
probs <- c(0.02, 0.02, 0.02, 0.02, 0.2)
probs[match(dataSet$truth[i], answers)] <- 0.9
temp <- (rep(sample(answers, no.tasks, prob = probs, no.workers)))
})