维数不正确 - 并行R计算

时间:2015-11-19 20:57:56

标签: r tm snow

在R中使用tm包和并行计算时遇到问题,我不确定我是在做傻事还是错误。

我创建了一个可重复的小例子:

# Load the libraries
library(tm)
library(snow)

# Create a Document Term Matrix
test_sentence = c("this is a test", "this is another test")
test_corpus = VCorpus(VectorSource(test_sentence))
test_TM = DocumentTermMatrix(test_corpus)

# Define a simple function that returns the matrix for the i-th document
test_function = function(i, TM){ TM[i, ] }

如果我使用这个例子运行一个简单的lapply,我会得到预期的没有任何问题:

# This returns the expected list containing the rows of the Matrix
res1 = lapply(1:2, test_function, test_TM)

但如果我并行运行,我会收到错误:

第一个错误:维数不正确

# This should return the same thing of the lapply above but instead it stops with an error
cl = makeCluster(2)
res2 = parLapply(cl, 1:2, test_function, test_TM)
stopCluster(cl)

1 个答案:

答案 0 :(得分:2)

问题是不同的节点不会自动加载tm包。但是,加载包是必要的,因为它定义了相关对象类的cl <- makeCluster(rep("localhost",2), type="SOCK") clusterEvalQ(cl, library(tm)) clusterExport(cl, list=ls()) res <- parLapply(cl, as.list(1:2), test_function, test_TM) stopCluster(cl) 方法。

以下代码执行以下操作:

  1. 启动群集
  2. 在所有节点中加载class Foo private def bar "bar" end magic_private_method_defined_test_method :bar #=> true end
  3. 将所有对象导出到所有节点
  4. 运行功能
  5. 停止群集
  6. class Foo
      private
    
      def bar
        "bar"
      end
    
      respond_to? :bar #=> false
      #this actually calls respond_to on the class, and so respond_to :superclass gives true
      defined? :bar #=> nil
      instance_methods.include?(:bar) #=> false
      methods.include?(:bar) #=> false
      method_defined?(:bar) #=> false
      def bar
        "redefined!"
      end # redefining doesn't cause an error or anything
    
      public
      def bar
        "redefined publicly!"
      end #causes no error, behaves no differently whether or not #bar had been defined previously
    end