试图在数据块中使用gapply函数进行并行处理。我面临两个问题 -使用文档中提到的代码时,有时我运行时,它会给我反序列化==“ byte”不是TRUE。它不一致,有时会运行,有时却不一致 -我想传递一个自定义函数来使另一个程序包中的参数空缺,但是它给出了一个错误,指出找不到该函数
有人遇到过类似的问题或想法如何解决吗?
library(SparkR)
library(tsintermittent)
df <- createDataFrame (list(list(1L, 1, "1", 0.1), list(1L, 2, "1", 0.2), list(3L, 3, "3", 0.3)), c("a", "b", "c", "d"))
schema <- structType(structField("a", "integer"), structField("c",
"string"),structField("avg", "double"))
result <- SparkR::gapply( df, c("a", "c"),
function(key, x) {
#library(tsintermittent)
y <- data.frame(key, crost(x$d), stringsAsFactors = FALSE)
}, schema)
head(result)
错误:-
org.apache.spark.SparkException: R computation failed with
Error in compute(mode, partition, serializer, deserializer, keys[[i]], :
deserializer == "byte" is not TRUE
Calls: compute -> stopifnot