我最近开始使用R中的parallel
软件包,这对我来说是一个奇迹。不过,我遇到了一个找不到答案的问题。
我正在尝试重新格式化某些数据,为此,我在并行情况下使用sapply()
或parSapply()
。在正常情况下,我会去:
sapply(1:nrow(aux),function(x){
r=which(M$Project==aux$Projecte[x] & M$Product==aux$Producte[x])
c=which(names(M)==aux$Atribut[x])
l=aux$meanss[x]
M[r,c]<<-l
})
使用<<-
为全球环境分配价值。对于并行情况,我去:
no_cores <- detectCores()-2
cl <- makeCluster(no_cores)
clusterExport(cl,c("aux","M"))
parSapply(cl,1:20,function(x){
r=which(M$Project==aux$Projecte[x] & M$Product==aux$Producte[x])
c=which(names(M)==aux$Atribut[x])
l=aux$meanss[x]
M[r,c]<<-l
})
我知道这些值正在计算(已打印),但是它们没有像使用M
那样分配给sapply()
数据帧。我环顾四周,但未找到任何有关此的信息。在并行应用函数内分配值时,是否应考虑任何特殊考虑?
谢谢,请在下面找到可复制的示例。
M:
structure(list(Project = c("11I040119", "11I040119", "11I040119",
"11I040119", "11I040119", "11I040119", "11I040119", "11I040119",
"11I040119", "11I040119", "11I040119", "11I040119", "11I040119"
), Product = c("Brulerie St. Denis (BOLD)", "Ethical Beans (BOLD)",
"Folgers (BOLD)", "Illy drip coffe (BOLD)", "Illy Espresso Coffee (BOLD)",
"Just Us (BOLD)", "Lavazza caffè espresso (BOLD)", "Lavazza Crema e gusto (BOLD)",
"Lavazza Tierra (BOLD)", "Medaglia d'Oro (BOLD)", "Seattle Best 4 (BOLD)",
"Starbucks café Verona (BOLD)", "Tully's (BOLD)"), Thing1 = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Thing2 = c(0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 13L), class = "data.frame")
辅助:
structure(list(Projecte = c("11I040119", "11I040119", "11I040119",
"11I040119", "11I040119", "11I040119", "11I040119", "11I040119",
"11I040119", "11I040119", "11I040119", "11I040119", "11I040119",
"11I040119", "11I040119"), Producte = c("Brulerie St. Denis (BOLD)",
"Ethical Beans (BOLD)", "Folgers (BOLD)", "Illy drip coffe (BOLD)",
"Illy Espresso Coffee (BOLD)", "Just Us (BOLD)", "Lavazza caffè espresso (BOLD)",
"Lavazza Crema e gusto (BOLD)", "Lavazza Tierra (BOLD)", "Medaglia d'Oro (BOLD)",
"Seattle Best 4 (BOLD)", "Starbucks café Verona (BOLD)", "Tully's (BOLD)",
"Brulerie St. Denis (BOLD)", "Ethical Beans (BOLD)"), Thing = c("Thing1",
"Thing1", "Thing1", "Thing1", "Thing1", "Thing1", "Thing1", "Thing1",
"Thing1", "Thing1", "Thing1", "Thing1", "Thing1", "Thing2", "Thing2"
), Value = c(0.142857142857143, 0.242857141154153, 0.614285715988704,
0, 0, 0.0714285714285714, 1.01428570917674, 0, 0.971428564616612,
0.5, 0.357142857142857, 0.642857142857143, 0.714285714285714,
3, 5)), row.names = c(NA, 15L), class = "data.frame")
所需输出(M):
Project Product Thing1 Thing2
1 11I040119 Brulerie St. Denis (BOLD) 0.14285714 3
2 11I040119 Ethical Beans (BOLD) 0.24285714 5
3 11I040119 Folgers (BOLD) 0.61428572 0
4 11I040119 Illy drip coffe (BOLD) 0.00000000 0
5 11I040119 Illy Espresso Coffee (BOLD) 0.00000000 0
6 11I040119 Just Us (BOLD) 0.07142857 0
7 11I040119 Lavazza caffè espresso (BOLD) 1.01428571 0
8 11I040119 Lavazza Crema e gusto (BOLD) 0.00000000 0
9 11I040119 Lavazza Tierra (BOLD) 0.97142856 0
10 11I040119 Medaglia d'Oro (BOLD) 0.50000000 0
11 11I040119 Seattle Best 4 (BOLD) 0.35714286 0
12 11I040119 Starbucks café Verona (BOLD) 0.64285714 0
13 11I040119 Tully's (BOLD) 0.71428571 0
答案 0 :(得分:4)
这不是一个完整的解决方案,但是为了快速起见-并行化通过启动多个进程来工作(想象并排运行多个R会话)。这些进程中的每一个都有自己的全局环境.GlobalEnv
,因此您的M[r,c] <<- l
实际上是为每个进程分配其他位置。
一种可能的实现方式是,您以某种方式重写函数,例如,返回list(r, c, l)
并使用parLapply
,然后获得并行收集的索引和值的列表,并执行主要过程中的任务分配。