我使用spark 1.6读取csv文件,为Java编码
URL resource = Main.class.getResource("GlobalLandTemperaturesByCountry.csv");
File filePath = Paths.get(resource.toURI()).toFile();
JavaSparkContext jsc = new JavaSparkContext("local","Java Spark example");
SQLContext sqlContext = new SQLContext(jsc);
DataFrame dataFrame = sqlContext.read()
.format("csv")
.option("header", "true")
.load(filePath.getAbsolutePath());
dataFrame.show();
但是.... 线程“主”中的异常java.lang.ClassNotFoundException:无法找到数据源:csv。请在http://spark-packages.org
中找到软件包我做错了什么?我的版本没有csv解析器?路径正确 请帮助
答案 0 :(得分:1)
更改format("csv")
并添加依赖项
<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.11</artifactId>
<version>1.5.0</version>
</dependency>
结果代码:
URL resource = Main.class.getResource("GlobalLandTemperaturesByCountry.csv");
File filePath = Paths.get(resource.toURI()).toFile();
JavaSparkContext jsc = new JavaSparkContext("local","Java Spark example");
SQLContext sqlContext = new SQLContext(jsc);
DataFrame dataFrame = sqlContext.read()
.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("header", "true")
.load(filePath.getAbsolutePath());
dataFrame.show();
工作了!