我试图通过替换“循环”来优化循环性能。与' foreach并行处理循环'我有大约1000个具有不同行和列的小数据帧。
我的目的是通过使用' dplyr package的bind_rows'来转换行绑定所有这些数据帧。进入矩阵。我在网上做了一些关于基础知识和foreach循环的研究。设置和'做并行'例如run a for loop in parallel in R,Parallel R Loops for Windows and Linux,R - parallel computing in 5 minutes (with foreach and doParallel)
以下是我的环境(数据准备)
中的更多详细信息示例小型数据帧 - 注意:所有这些小型数据帧可能具有不同的行和列。
RYW0001_rs <- data.frame(
"A" = c("Coff", "Apple", "Coff", "Milk", "Milk", "Coff"),
"B" = c("ToothB", "Apple", "Orange", NA, "Pear", "Grape"),
"C" = c("ToothP", "ToothP", NA, NA, "ToothB", "Yam"),
"D" = c(NA, "Potato", NA, NA, NA, NA)
)
RYW0002_rs <- data.frame(
"A" = c("Coff", "Apple", "Coff", "Milk", "Milk", "Coff"),
"B" = c(NA, "Potato", NA, NA, NA, NA)
)
RYW0003_rs <- data.frame(
"A" = c("Coff", "Apple", "Coff", "Milk", "Milk", "Coff"),
"B" = c("ToothB", "Apple", "Orange", NA, "Pear", "Grape"),
"C" = c("Apple", "ToothP", "Orange", NA, "Milk", "Grape"),
"D" = c("ToothP", "Orange", NA, NA, "Pear", "Yam"),
"E" = c("ToothP", "ToothP", NA, NA, "ToothB", "Yam"),
"F" = c(NA, "Potato", NA, NA, NA, NA)
)
将数据框存储为字符(用作宏变量)
Merchant_No_rs1 <- c('RYW0001_rs','RYW0002_rs','RYW0003_rs')
编码1:上一个循环 [工作正常,虽然下面有一些警告信息,但不会影响我的预期结果]
Warning messages:
1: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
2: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
3: In bind_rows_(x, .id) : Unequal factor levels: coercing to character
第1步:创建EMPTY新temp_all文件
temp <- NULL
第2步:for循环
for (j in 1:length(Merchant_No_rs1)) {
temp <- bind_rows(temp, get(Merchant_No_rs1[[j]]))
print(dim(temp_all))
}
编码2:当前的foreach循环 [不起作用,遇到如下错误]
Error in { : task 1 failed - "object 'RYW0001_rs' not found"
第1步:创建EMPTY新临时文件
temp <- NULL
第2步:foreach循环
foreach (j=1:length(Merchant_No_rs1), .packages=c("dplyr"), .export=sprintf("%s",Merchant_No_rs1[[j]])) %dopar% {
temp <- bind_rows(temp, get(Merchant_No_rs1[[j]]))
}
我的预期结果与编码1的结果相同,尽管所有小数据都有不同的行和列,如果有新列,临时表中的列会追加。以下是结果表。 temp
问题:有没有办法使用&#39; foreach循环&#39;进行并行处理?但是有相同的结果,比如&#39;做循环&#39;?
任何帮助将不胜感激:)谢谢