我正在尝试运行以下代码以将文件作为数据帧读取到Kafka主题(用于Spark Streaming),该主题是通过Eclipse IDE使用Scala开发的,并通过在服务器上使用spark-submit来运行瘦jar(适当地定义了模式)来适当定义模式调用任何其他软件包),并在下面出现错误。尝试根据spark.read.option.schema.csv
相似的错误对互联网进行研究,但没有成功。
使用readStream选项时,是否有人遇到过类似的Spark Streaming问题?
期待听到您的回复!
错误:
Exception in thread "main" java.lang.RuntimeException: Multiple sources found for csv (com.databricks.spark.csv.DefaultSource15, org.apache.spark.sql.execution.datasources.csv.CSVFileFormat), please specify the fully qualified class name.
代码:
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("org.apache.spark.sql.execution.datasources.csv.CSVFileFormat").csv("server_path") //does not resolve error
val csvdf = spark.readStream.option("sep", ",").schema(userSchema).format("com.databricks.spark.csv.DefaultSource15").csv("server_path") //does not resolve error
答案 0 :(得分:0)
Pom.xml没有显式调用的spark-csv jar。
结果显示,包含用于Spark2的jar的服务器HDP路径同时具有spark-csv和spark-sql jar,这导致了Csv源冲突的问题。 在从路径中删除多余的spark-csv jar时,问题已解决。