我有sparkDataFrame,并且正在使用gapplyCollect
运行一个函数。通常这会失败并显示以下错误
Error in handleErrors(returnStatus, conn) :
org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 694.0 failed 4 times, most recent failure: Lost task 25.3 in stage 694.0 (TID 24447, 10.139.64.4, executor 0): org.apache.spark.SparkException: R computation failed with
Error in db.readTypedVector(con, colType, numRows) :
Unsupported type for deserialization: Some message from one of the columns. Calls: <Anonymous> -> lapply -> lapply -> FUN -> db.readTypedVector
Execution halted
at org.apache.spark.api.r.RRunner.compute(RRunner.scala:108)
at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$14.apply(objects.scala:455)
at org.apache.spark.sql.execution.FlatMapGroupsInRExec$$anonfun$14.apply(objects.scala:432)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:842)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:842)
有什么想法可能有问题吗? 消息简短,但具有[],::和-字符。如果我仅将消息剥离为10个字符,它将起作用。 我正在使用Azure数据块。