我试图将一组.jsonl文件(换行符分隔的jsons)加载到Spark 1.5.1(特别是使用PySpark) - 并将它们保存为Parquet文件。这是高阶代码:
print "Loading: " + path
schemaString = "data name version"
fields = StructType([StructField(field_name, StringType(), True) for field_name in schemaString.split(" ")])
dataJson = sqlContext.read.json(path, fields).cache()
parqPath = constructParqPath(path)
print "ParqPath: " + parqPath
dataJson.write.parquet(parqPath)
使用submit-spark运行代码时出现以下异常:
java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.unsafe.types.UTF8String
如何明确地投射或正确加载数据?