我是Scala的新手,试图通过编程从S3中读取制表符分隔的值文件并将数据加载到CSV文件中。
每当我运行Scala应用程序时,都会出现以下错误:
找不到org.apache.hadoop.fs.s3native.NativeS3FileSystem类
Scala版本为2.12
val conf = new SparkConf()
.setAppName("StreamLogic")
.setMaster("local")
val sc = new SparkContext(conf)
val hadoopConf = sc.hadoopConfiguration;
hadoopConf.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId", "awsAccessKeyId")
hadoopConf.set("fs.s3.awsSecretAccessKey", "awsSecretAccessKey")
val ssc = new org.apache.spark.streaming.StreamingContext(
sc, Seconds(60))
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val segments = sqlContext.read.format("com.databricks.spark.csv")
.option("delimiter", "\t")
.load("s3://awss3bucket/tsv/inputfile.tsv.gz")
val selectedData = segments.select("C11", "C12")
selectedData.write
.format("com.databricks.spark.csv")
.option("header", "true")
.save("/home/sparkuser/output2.csv")
我希望从TSV中读取每一行并加载到CSV中。