我正在尝试创建一个foreach
,以便为更大的数据帧固定拼写错误的单词替换。我的代码运行没有问题,但是没有看到正确的结果。请参见下面的数据框示例和使用的代码。
我有一个主数据框和一个数据框,可用于查找和替换来自主数据框的预定义的拼写错误的文本:
#create main data frame
df <- data.frame("Index" = 1:7, "Text" = c("Brad came to dinner with us tonigh.",
"Wuld you like to trave with me?",
"There is so muh to undestand.",
"Sentences cone in many shaes and sizes.",
"Learnin R is fun",
"yesterday was Friday",
"bing search engine"), stringsAsFactors = FALSE)
#create predefined misspelled data frame
df_r <- data.frame("misspelled" = c("tonigh", "Wuld", "trave", "muh", "undestand", "shaes", "Learnin"),
"correction" = c("tonight", "Would", "travel", "much", "understand", "shapes", "Learning"))
library(DataCombine)
library(doParallel)
library(foreach)
no_cores <- detectCores()
cl <- makeCluster(no_cores[1]-1)
registerDoParallel(cl)
df_replacement <- foreach((df$Text), .combine = cbind) %dopar% {
replacement = DataCombine::FindReplace(data = df, Var = "Text", replaceData = df_r,
from = "misspelled", to = "correction", exact = FALSE)
replacement
}
stopCluster(cl)
我不确定在foreach
部分中我做错了什么。任何建议表示赞赏。
答案 0 :(得分:1)
我认为您正在寻找这个:
df_replacement <- foreach(i = (rownames(df)), .combine = rbind) %dopar% {
replacement = DataCombine::FindReplace(data = df[i,], Var = "Text", replaceData = df_r,
from = "misspelled", to = "correction", exact = FALSE)
replacement
}
发生了什么事
Foreach知道它必须运行i行很长时间。但是您的函数始终调用整个函数!数据框。因此,输出也是整个数据帧,每个循环的长度为两列。 .combine=cbind
按列组合数据框。...2(列)* 7(cores)=14。因此,请确保FindReplace仅调用要具有的行,而不是在每个循环中调用整个数据框。
我通过仅调用df[i,]
中每个迭代FindReplace
的行来进行编辑。我也将cbind
更改为rbind
,因为您要在此后添加行而不是列。