将测试结果数据帧保存到csv中给出错误

时间:2019-07-18 20:35:16

标签: apache-spark pyspark apache-spark-sql

我已根据模型测试图像,到目前为止一切正常,但现在不知道为什么在将测试结果数据帧保存为CSV时为什么会出错。数据框中的总行数:8000。下面是代码:

from pyspark.ml.classification import LogisticRegressionModel
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml import PipelineModel
from sparkdl import DeepImageFeaturizer

model_test = LogisticRegressionModel.load('/mnt/<mount-name>/Xception')
featurizer_test = DeepImageFeaturizer(inputCol = "image", outputCol = "features", modelName = "Xception")

# Pipeline both entities
p_test = PipelineModel(stages=[featurizer_test, model_test])

# Test and evaluate
tested_df_test = p_test.transform(test_df)
evaluator_test = MulticlassClassificationEvaluator(metricName = "accuracy")
tested_df_test.printSchema()

==== output ======
image:
  -- origin: string
  -- height: integer
  -- width: integer
  -- nChannels: integer
  -- mode: integer
  -- data: binary
label:integer
features: udt
rawPrediction:udt
probability: udt
prediction: double

然后我要存储数据框中的几列,以便创建另一个数据框:

from pyspark.sql.functions import col
csvDF = tested_df_test.select(col("image.origin").alias("filename"),col("prediction").alias("predicted_label"),col("label").alias("original_label"))

#below line gives error
csvDF.write.format("com.databricks.spark.csv").mode("overwrite").option("header","true").save("wasbs://testresultcsv@machinelearningdocuments.blob.core.windows.net/models/Xception1.csv")

以下几行出现错误:

Py4JJavaError: An error occurred while calling o1120.save.
: org.apache.spark.SparkException: Job aborted.
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
    at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:192)
.
.
.
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 20 in stage 45.0 failed 4 times, most recent failure: Lost task 20.3 in stage 45.0 (TID 49873, 10.139.64.5, executor 0): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$3: (struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>) => struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
.
.
.
Caused by: java.lang.IllegalArgumentException: requirement failed:  Only one byte per channel is currently supported, got 0 bytes for
 image of size (-1, -1, -1).

    at scala.Predef$.require(Predef.scala:224)
.
.
.
Caused by: org.apache.spark.SparkException: Failed to execute user defined function($anonfun$3: (struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>) => struct<origin:string,height:int,width:int,nChannels:int,mode:int,data:binary>)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.project_doConsume_0$(Unknown Source)
.
.
.
Caused by: java.lang.IllegalArgumentException: requirement failed:  Only one byte per channel is currently supported, got 0 bytes for
 image of size (-1, -1, -1).

    at scala.Predef$.require(Predef.scala:224)

0 个答案:

没有答案