Java Spark 1.6 CSV文件

时间:2018-07-10 08:43:34

标签: java csv apache-spark

我使用spark 1.6读取csv文件,为Java编码

URL resource = Main.class.getResource("GlobalLandTemperaturesByCountry.csv");
    File filePath = Paths.get(resource.toURI()).toFile();

    JavaSparkContext jsc = new JavaSparkContext("local","Java Spark example");
    SQLContext sqlContext = new SQLContext(jsc);

    DataFrame dataFrame = sqlContext.read()
            .format("csv")
            .option("header", "true")
            .load(filePath.getAbsolutePath());
    dataFrame.show();

但是.... 线程“主”中的异常java.lang.ClassNotFoundException:无法找到数据源:csv。请在http://spark-packages.org

中找到软件包

我做错了什么?我的版本没有csv解析器?路径正确 请帮助

1 个答案:

答案 0 :(得分:1)

为.format(“ com.databricks.spark.csv”)

更改format("csv")并添加依赖项

<!-- https://mvnrepository.com/artifact/com.databricks/spark-csv -->
    <dependency>
        <groupId>com.databricks</groupId>
        <artifactId>spark-csv_2.11</artifactId>
        <version>1.5.0</version>
    </dependency>

结果代码:

 URL resource = Main.class.getResource("GlobalLandTemperaturesByCountry.csv");
    File filePath = Paths.get(resource.toURI()).toFile();

    JavaSparkContext jsc = new JavaSparkContext("local","Java Spark example");
    SQLContext sqlContext = new SQLContext(jsc);

    DataFrame dataFrame = sqlContext.read()
            .format("com.databricks.spark.csv")
            .option("inferSchema", "true")
            .option("header", "true")
            .load(filePath.getAbsolutePath());
    dataFrame.show();

工作了!