从Spark-CSV写一个镶木地板文件

时间:2017-04-11 08:02:58

标签: java csv apache-spark parquet spark-csv

我想使用spark-csv将csv转换为镶木地板。读取文件并将其另存为数据集。不幸的是我不能把它作为镶木地板文件写回来。有没有办法实现这个目标?

SparkSession spark = SparkSession.builder().appName("Java Spark SQL basic example")
        .config("spark.master", "local").config("spark.sql.warehouse.dir", "file:///C:\\spark_warehouse")
        .getOrCreate();

Dataset<Row> df = spark.read().format("com.databricks.spark.csv").option("inferSchema", "true")
        .option("header", "true").load("sample.csv");

df.write().parquet("test.parquet");
  

17/04/11 09:57:32错误执行者:阶段3.0中任务0.0中的异常   (TID 3)java.lang.NoSuchMethodError:   org.apache.parquet.column.ParquetProperties.builder()Lorg /阿帕奇/地板/列/ ParquetProperties $生成器;     在   org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:362)     在   org.apache.parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:350)     在   。org.apache.spark.sql.execution.datasources.parquet.ParquetOutputWriter(ParquetOutputWriter.scala:37)     在   org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat $$匿名$ 1.newInstance(ParquetFileFormat.scala:145)     在   org.apache.spark.sql.execution.datasources.FileFormatWriter $ SingleDirectoryWriteTask(FileFormatWriter.scala:234)。     在   org.apache.spark.sql.execution.datasources.FileFormatWriter $ .ORG $阿帕奇$火花$ SQL $执行$ $的数据源$$ FileFormatWriter executeTask(FileFormatWriter.scala:182)     在   org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $写$ 1 $$ anonfun $ 3.apply(FileFormatWriter.scala:129)     在   org.apache.spark.sql.execution.datasources.FileFormatWriter $$ anonfun $ $写1 $$ anonfun $ 3.apply(FileFormatWriter.scala:128)     在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)     在org.apache.spark.scheduler.Task.run(Task.scala:99)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:282)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

1 个答案:

答案 0 :(得分:1)

我修复了一个解决方法。我不得不评论这两个镶木地板的依赖关系,但我不确定为什么他们互相帮助:

if(isset($_POST['carCompany'])){
        foreach($_POST['carCompany'] AS $car_company){
            $companyLocDetail = $car_company;
            echo 'Company Location Detail: '.$companyLocDetail.'<br>';
        }
}