我一直在尝试并行化我的代码,因为目前我正在使用双循环来记录结果。我一直试图看看如何在R中使用SNOW和doParallel软件包来做到这一点。
如果您想要一个可复制的示例,请使用
residual_anomalies <- matrix(sample(c('ANOMALY','NO SIGNAL'),300,replace=T),nrow=100)
而不是使用这三行
inputfile <- paste0("simulation_",i,"_",metrics[k],"_US.csv")
data <- residuals(inputfile)
residual_anomalies <- conceptdrift(data,length=10,threshold=.05)
在嵌套的for循环中。整个代码如下。
source("GetMetrics.R")
source("slowdrift_resampling_vectorized.R")
metrics <- unique(metrics)
num_metrics <- length(metrics)
f1_scores_table_raw = data.frame(matrix(ncol=10,nrow=46))
f1_scores_table_pred = data.frame(matrix(ncol=10,nrow=46))
rownames(f1_scores_table_raw) <- metrics
colnames(f1_scores_table_raw) <- paste0("Sim",1:10)
rownames(f1_scores_table_pred) <- metrics
colnames(f1_scores_table_pred) <- paste0("Sim",1:10)
for(k in 1:num_metrics){
for(i in 1:10){
#inputfile <- paste0("simulation_",i,"_",metrics[k],"_US.csv")
#data <- residuals(inputfile)
#residual_anomalies <- conceptdrift(data,length=10,threshold=.05)
#the above is how I get the data frame but I'll create another one for reproducibility.
residual_anomalies <- as.data.frame(matrix(sample(c('ANOMALY','NO SIGNAL'),300,replace=T),nrow=100))
names(residual_anomalies) <- c("Raw_Anomaly","Prediction_Anomaly","True_Anomaly")
#calculate precision and recall for an F1 score
#first for raw data
counts <- ifelse(rowSums(residual_anomalies[c("Raw_Anomaly","True_Anomaly")]=='ANOMALY')==2,1,0)
correct_detections <- sum(counts)
total_predicted = sum(residual_anomalies$Raw_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies$True_Anomaly =='ANOMALY')
raw_precision = correct_detections / total_predicted
raw_recall = correct_detections / total_actual
f1_raw = 2*raw_precision*raw_recall / (raw_precision+raw_recall)
#then for prediction (DLM,ESP,MLR) data
counts <- ifelse(rowSums(residual_anomalies[c("Prediction_Anomaly","True_Anomaly")]=='ANOMALY')==2,1,0)
correct_detections <- sum(counts)
total_predicted = sum(residual_anomalies$Prediction_Anomaly =='ANOMALY')
total_actual = sum(residual_anomalies$True_Anomaly =='ANOMALY')
pred_precision = correct_detections / total_predicted
pred_recall = correct_detections / total_actual
f1_pred = 2*pred_precision*pred_recall / (pred_precision+pred_recall)
f1_scores_table_raw[[k,i]] <- f1_raw
f1_scores_table_pred[[k,i]] <- f1_pred
}
}
之前,我使用%dopar%在外环上使用foreach,但我遇到的问题是我一直没有找到问题'%dopar%'。我应该并行化两个循环还是仅仅一个?
我也知道foreach会创建一个列表并将其存储到变量中,但是我还能在其foreach循环中存储其他变量吗?例如,我仍然想将数据记录到我的f1_scores_table_raw和f1_scores_table_pred数组中。
谢谢!
答案 0 :(得分:5)
如果在循环级别之间使用%:%
运算符,Foreach将自动处理此问题(请参阅“嵌套”小插图):
require(foreach)
# Register parallel backend
foreach (k = 1:num_metrics) %:% # nesting operator
foreach (i = 1:10) %dopar% {
# code to parallelise
}