Scala:将数据写入S3存储桶

时间:2018-11-18 18:40:11

标签: amazon-web-services apache-spark amazon-s3 apache-spark-sql

我正在尝试将数据写入S3存储桶,但出现错误。

SQLHadoopMapReduceCommitProtocol: Using output committer class org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
18/11/18 23:32:14 ERROR Utils: Aborting task
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
    at org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Native Method)
    java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
18/11/18 23:32:14 WARN FileOutputCommitter: Could not delete s3a://Accesskey:SecretKey@test-bucket/Output/Check1Result/_temporary/0/_temporary/attempt_20181118233210_0004_m_000000_0
18/11/18 23:32:14 ERROR FileFormatWriter: Job job_20181118233210_0004 aborted.
18/11/18 23:32:14 ERROR Executor: Exception in task 0.0 in stage 4.0 (TID 209)
org.apache.spark.SparkException: Task failed while writing rows.

我尝试了下面的代码,并且能够将数据写入本地文件系统。但是,当我尝试将数据写入S3存储桶时,我遇到了以上错误。

我的代码:

package Spark_package

import org.apache.spark.SparkConf
import org.apache.spark.sql.SparkSession

object dataload {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.master("local[*]").appName("dataload").config("spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version", "2").getOrCreate()
    val sc = spark.sparkContext
    sc.hadoopConfiguration.set("mapreduce.fileoutputcommitter.algorithm.version", "2")

    val conf = new SparkConf().setAppName("dataload").setMaster("local[*]").set("spark.speculation","false")
    val sqlContext = spark.sqlContext


    val data = "C:\\docs\\Input_Market.csv"
    val ddf = spark.read.format("csv").option("inferSchema","true").option("header","true").option("delimiter",",").load(data)
    ddf.createOrReplaceTempView("data")
    val res = spark.sql("select count(*),cust_id,sum_cnt from data group by cust_id,sum_cnt")
    res.write.option("header","true").format("csv").save("s3a://Accesskey:SecretKey@test-bucket/Output/Check1Result1")


    spark.stop()
  }
}

0 个答案:

没有答案