我提前道歉,对我的问题进行如此详尽的解释。我使用三个函数Shuffle100
my_List
和Final_lists
(下面)在主列表中从分类树类概率(分组因子:G8和V4)生成了10个嵌套数据帧。对不起,我问这个简单的问题,但我无法弄明白。如果有人找到解决方案,非常感谢。
(1)我想将confusionMatrix()
中的函数caret package
插入到函数shuffle100
中,为每个子集生成10个混淆矩阵
shuffle100
,my_list
和Final_lists
library(plyr)
library(caret)
library(e1071)
library(rpart)
set.seed(1235)
shuffle100 <-lapply(seq(10), function(n){ #Select the production of 10 dataframes
subset <- normalised_scores[sample(nrow(normalised_scores), 80),] #Shuffle rows
subset_idx <- sample(1:nrow(subset), replace = FALSE)
subset <- subset[subset_idx, ] #training subset
subset1<-subset[-subset_idx, ] #test subset
subset_resampled_idx <- createDataPartition(subset_idx, times = 1, p = 0.7, list = FALSE) #70 % training set
subset_resampled <- subset[subset_resampled_idx, ]
ct_mod<-rpart(Matriline~., data=subset_resampled, method="class", control=rpart.control(cp=0.005)) #10 ct
ct_pred<-predict(ct_mod, newdata=subset[, 2:13])
ct_dataframe=as.data.frame(ct_pred)#create new data frame
confusionMatrix(ct_dataframe, normalised_scores$Family)
}
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
1: lapply(seq(10), function(n) {
subset <- normalised_scores[sample(nrow(normalised_scores
2: FUN(X[[i]], ...)
3: confusionMatrix(ct_dataframe, normalised_scores$Family)
4: confusionMatrix.default(ct_dataframe, normalised_scores$Family)
5: factor(data)
6: sort.list(y)
#Produce three columns: Predicted, Actual and Binary
my_list <- lapply(shuffle100, function(df){#Create two new columns Predicted and Actual
if (nrow(df) > 0)
cbind(df, Predicted = c(""), Actual = c(""), Binary = c(""))
else
bind(df, Predicted = character(), Actual = c(""), Binary = c (""))
})
#Fill the empty columns with NA's
Final_lists <- lapply(my_list, function(x) mutate(x, Predicted = NA, Actual = NA, Binary = NA))
#Create a dataframe from the column normalised_scores$Family to fill the Actual column
Actual_scores<-Final_normalised3$Family
Final_scores<-as.data.frame(Actual_scores)
#Fill in the Predicted, Actual and Binary columns
Predicted_Lists <- Final_lists %>%
mutate(Predicted=ifelse(G8 > V4, G8, V4)) %>% # assuming if G8 > V4 then Predicted=G8
mutate(Actual=Final_scores) %>% # your definition of Actual is not clear
mutate(Binary=ifelse(Predicted==Actual, 1, 0))
#Error messages
Error in ifelse(G8 > V4, G8, V4) : object 'G8' not found
根据列V4或G8的行中概率可能更大的条件,编写函数或for循环以填充每个子集的Predicted
,Actual
和Binary
列比彼此更小或更小。但是,我对函数和循环的正确语法感到困惑
for loop
不起作用 for(i in 1:length(Final_lists)){ #i loops through each dataframe in the list
for(j in 2:nrow(Final_lists[[i]])){ #j loops through each row of each dataframe in the list
if(Final_lists[[i]][j, "G8"] > Final_lists[[i]][j, "V4"]) { #if the probability of G8 > V4 in each row of each dataframe in each list
Final_lists[[i]][j, [j["Predicted" == "NA"]] ="G8" #G8 will be filled into the same row in the `Predicted' column
}
else {
Final_lists[[i]][j, [Predicted == "NA"]] ="V4" #V4 will be filled into the same row in the `Predicted' column
}
print(i)
}
}
填充列时,每个子集都应具有此格式:
G8 V4 Predicted Actual Binary
0.1764706 0.8235294 V4 V4 1
0.7692308 0.2307692 G8 V4 0
0.7692308 0.2307692 G8 V4 0
0.7692308 0.2307692 G8 V4 0
0.7692308 0.2307692 G8 V4 0
0.1764706 0.8235294 V4 V4 1
Predicted
列如果G8的概率> V4,然后为空Predicted
行分配G8。但是,如果V4> G8,然后是空的`预测&#39;行将被分配V4。
Actual
列这些是来自每个子集的分类树模型的实际预测类概率预测,它们包含在data_frame中“normalised_scores”
Binary
列如果Predicted
和Actual
行具有相同的结果(例如G8和G8),则为Binary
行分配值1.但是,如果行{ {1}}和Predicted
列不同(例如G8和V4),然后为Actual
行分配值0.
我使用此工作代码实现了这些目标,但是,我不确定如何将此代码应用于主列表中的子集。
Binary
set.seed(1235)
# Randomly permute the data before subsetting
mydat_idx <- sample(1:nrow(Final_normalised_scores), replace = FALSE)
mydat <- Final_normalised3[mydat_idx, ]
mydat_resampled_idx <- createDataPartition(mydat_idx, times = 1, p = 0.7, list = FALSE)
mydat_resampled <- mydat[mydat_resampled_idx, ] # Training portion of the data
mydat_resampled1 <- mydat[-mydat_resampled_idx, ]
#Classification tree
ct_mod <- train(x = mydat_resampled[, 2:13], y = as.factor(mydat_resampled[, 1]),
method = "rpart", trControl = trainControl(method = "repeatedcv", number=10, repeats=100, classProbs = TRUE))
#Model predictions
ct_pred <- predict(ct_mod, newdata = mydat[ , 2:13], type = "prob")
Final_Predicted<-as.data.frame(ct_pred)
#Produce three empty columns: Predicted, Actual and Binary
Final_Predicted$Predicted<-NA
Final_Predicted$Actual<-NA
Final_Predicted$Binary<-NA
#Fill in the Predicted column
for (i in 1:length(Final_Predicted$G8)){
if(Final_Predicted$G8[i]>Final_Predicted$V4[i]) {
Final_Predicted$Predicted[i]<-"G8"
}
else {
Final_Predicted$Predicted[i]<-"V4"
}
print(i)
}
#Fill in the Actual column using the actual predictions from the dataframe normalised_scores
Final_Predicted$Actual<-normalised_scores$Family
#Fill in the Binary column
for (i in 1:length(Final_Predicted$Binary)){
if(Final_Predicted$Predicted[i]==Final_Predicted$Actual[i]) {
Final_Predicted$Binary[i]<-1
}
else {
Final_Predicted$Binary[i]<-0
}
print(i)
}
SummarySE (Rmisc package) to produce a barplot with error bars (ggplot2)
答案 0 :(得分:1)
您对问题的描述有点长,但可能的dplyr解决方案如下所示:
Final_Predicted$Actual <- ... # fill actual values
Final_Predicted <- Final_Predicted %>%
mutate(Predicted=ifelse(G8 > V4, G8, V4)) %>% # assuming if G8==V4 then Predicted=V4
mutate(Binary=ifelse(Predicted==Actual, 1, 0))
我实际上没有运行这个解决方案,但它应该是这些简短而简单的方法。希望这会有所帮助。