Question

（没有r-parallel-processing或R的foreach软件包的标签，如果有的话，我会在这里标记它们。欢迎标记建议）。

我有一个数据框“ training_data”和一个矢量“ cats”（用于分类数据）。

猫看起来像c("fruits", "vegetables", "meats")

我想遍历训练数据中的每只猫，并用“其他”代替任何低频水平。

这有效：

library(foreach)
foreach(c = cats) %do% { # not parallel processing
  print(c)
  freqs <- table(training_data[c])
  low_freqs <- names(which(freqs < 20000))
  training_data[c][[1]] <- ifelse(training_data[c][[1]] %in% low_freqs, "Other", training_data[c][[1]])
  return(NULL) # otherwise spits out the whole thing
}

在每次迭代中，第一行print（c）输出正在操作的向量cat的值，我在控制台中看到它：

“水果” “蔬菜” “肉”

在这些猫之后，由于循环中的最后一行，预期会在终端上打印3个NULL实例。然后，当我检查数据框training_data时，分类变量已按预期进行了转换。频率小于20k的任何级别都已被其他级别替换。

但是，如果我尝试使用并行：

library(foreach)
foreach(c = cats) %dopar% { # parallel (I have 8 cores)
  print(c)
  freqs <- table(training_data[c])
  low_freqs <- names(which(freqs < 20000))
  training_data[c][[1]] <- ifelse(training_data[c][[1]] %in% low_freqs, "Other", training_data[c][[1]])
  #return(NULL) # otherwise spits out the whole thing
}

所有发生的是在控制台上打印NULL。训练数据不会转换，控制台上也不会显示print（c）。

为什么只有％do％可以工作，而没有％dopar％？

Answer 1

这是一种使用一些不同样本数据的方法。使用并行时，迭代器的数据将被复制到子进程中，因此限制复制内容对于性能和内存使用非常重要。

library(doParallel)

# make a cluster

cl <- makeCluster(2)
registerDoParallel(cl)

# sample data

cats <- c("fruits", "vegetables", "meats")
df <- read.csv(text = "
cat,n
fruits,1
fruits,2
vegetables,4
meats,5
", stringsAsFactors = FALSE)

# Use foreach to iterate over a split dataframe, so only the subset data
# will be copied into each parallel process. Specify .combine = rbind to
# bind the resulting dataframe into one dataframe parallel 

result <- foreach(dfs = split(df, df$cat), .combine = rbind) %dopar% {

 # Print to show the structure of each split - Won't print in parallel
 # because the output .. will go to the console output of each process


  cat("Inside...\n")
  print(dfs)

  # Derive a new column
  dfs$new_col <- ifelse(dfs$n > 2, ">2", "<=2")

  # Return the result without printing
  invisible(dfs)  
}

# Print the combined new dataframe
print(result)
#>          cat n new_col
#> 1     fruits 1     <=2
#> 2     fruits 2     <=2
#> 4      meats 5      >2
#> 3 vegetables 4      >2

％do％VS。％dopar％，％dopar％不做任何更改，没有警告

1 个答案:

％do％VS。 ％dopar％，％dopar％不做任何更改，没有警告

1 个答案:

％do％VS。％dopar％，％dopar％不做任何更改，没有警告