我试图编写一个非常简单的SparkR程序,它使用dapply
来转换一条线。但是,我无法运行它:
lines <- read.text("/path/to/file.txt")
resultingSchema <- structType(structField("line", "string"))
linesmapped <- dapply(lines, function(line) {
y <- list()
y[[1]] <- paste(line[[1]], "1", sep = ":")
}, resultingSchema)
head(linesmapped)
这是我得到的错误:
Error in split.default(output, seq(nrow(output))) :
group length is 0 but data length > 0
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:59)
at org.apache.spark.sql.execution.r.MapPartitionsRWrapper.apply(MapPartitionsRWrapper.scala:29)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:178)
at org.apache.spark.sql.execution.MapPartitionsExec$$anonfun$6.apply(objects.scala:175)
答案 0 :(得分:3)
我犯了太多基本错误。我希望这对其他人有用(因为目前SparkR文档非常稀疏):
lines <- read.text("/path/to/file.txt")
resultingSchema <- structType(structField("value", "string"))
ldf <- dapply(lines, function(x) {
x <- transform(x, value=paste(value, "$", sep=""))
}, resultingSchema)
head(collect(ldf))
答案 1 :(得分:0)
与dapply相关的基本事情是要记住dapply中的函数期望数据帧和输出也是数据帧。
因此,请考虑将分区作为本机R数据帧传递给dapply函数并相应地应用函数。