foreach和doparallel在R中没有问题,但是没有得到任何正确的结果

时间:2019-05-08 20:17:16

标签: r foreach doparallel

我正在尝试创建一个foreach,以便为更大的数据帧固定拼写错误的单词替换。我的代码运行没有问题,但是没有看到正确的结果。请参见下面的数据框示例和使用的代码。

我有一个主数据框和一个数据框,可用于查找和替换来自主数据框的预定义的拼写错误的文本:

#create main data frame
df <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
                                            "Wuld you like to trave with me?",
                                            "There is so muh to undestand.",
                                            "Sentences cone in many shaes and sizes.",
                                            "Learnin R is fun",
                                            "yesterday was Friday",
                                            "bing search engine"), stringsAsFactors = FALSE)

#create predefined misspelled data frame
df_r <- data.frame("misspelled" = c("tonigh", "Wuld", "trave", "muh", "undestand", "shaes", "Learnin"), 
                   "correction" = c("tonight", "Would", "travel", "much", "understand", "shapes", "Learning"))

library(DataCombine)
library(doParallel)
library(foreach)
no_cores <- detectCores()
cl <- makeCluster(no_cores[1]-1)
registerDoParallel(cl)

df_replacement <- foreach((df$Text), .combine = cbind) %dopar% {
  replacement = DataCombine::FindReplace(data = df, Var = "Text", replaceData = df_r,
                                             from = "misspelled", to = "correction", exact = FALSE)

  replacement
}
stopCluster(cl)

我不确定在foreach部分中我做错了什么。任何建议表示赞赏。

1 个答案:

答案 0 :(得分:1)

我认为您正在寻找这个:

df_replacement <- foreach(i = (rownames(df)), .combine = rbind) %dopar% {
  replacement = DataCombine::FindReplace(data = df[i,], Var = "Text", replaceData = df_r,
                                         from = "misspelled", to = "correction", exact = FALSE)

  replacement
}

发生了什么事

Foreach知道它必须运行i行很长时间。但是您的函数始终调用整个函数!数据框。因此,输出也是整个数据帧,每个循环的长度为两列。 .combine=cbind按列组合数据框。...2(列)* 7(cores)=14。因此,请确保FindReplace仅调用要具有的行,而不是在每个循环中调用整个数据框。

我通过仅调用df[i,]中每个迭代FindReplace的行来进行编辑。我也将cbind更改为rbind,因为您要在此后添加行而不是列。