SparkR :: gapply问题

时间:2019-04-30 16:49:25

标签: r databricks sparkr

试图在数据块中使用gapply函数进行并行处理。我面临两个问题 -使用文档中提到的代码时,有时我运行时,它会给我反序列化==“ byte”不是TRUE。它不一致,有时会运行,有时却不一致 -我想传递一个自定义函数来使另一个程序包中的参数空缺,但是它给出了一个错误,指出找不到该函数

有人遇到过类似的问题或想法如何解决吗?

library(SparkR)
library(tsintermittent)

df <- createDataFrame (list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), c("a", "b", "c", "d"))

schema <-  structType(structField("a", "integer"), structField("c", 
           "string"),structField("avg", "double"))
result <- SparkR::gapply( df, c("a", "c"),
             function(key, x) {
             #library(tsintermittent)
             y <- data.frame(key, crost(x$d), stringsAsFactors = FALSE)
  }, schema)
 head(result)

错误:-

org.apache.spark.SparkException: R computation failed with
Error in compute(mode, partition, serializer, deserializer, keys[[i]],  : 
deserializer == "byte" is not TRUE
Calls: compute -> stopifnot

0 个答案:

没有答案