尝试使用spark从s3读取文件时出现异常。错误和代码如下。该文件夹由hadoop输出的一些名为part-00000 part-00001等的文件组成。它们的文件大小范围为0kb到几gb
16/04/07 15:38:58 INFO NativeS3FileSystem:打开键 'titlematching214 / 1.0 / bypublicdemand / part-00000'供您阅读 位置'0'16/04/07 15:38:58错误执行者:任务0.0中的异常 在阶段0.0(TID 0)org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException:S3 GET失败 '/titlematching214%2F1.0%2Fbypublicdemand%2Fpart-00000'XML错误 消息:
InvalidRange
要求的范围不是 satisfiablebytes = 0-01AED523DF401F17ECBYUH1h3WkC7 / G8 / EFE / YyHbzxoNTpRBiX6QMy2RXHur17lYTZXd7XxOWivmqIpu0F7Xx5zdWns =
object ReadMatches extends App{
override def main(args: Array[String]): Unit = {
val config = new SparkConf().setAppName("RunAll").setMaster("local")
val sc = new SparkContext(config)
val hadoopConf = sc.hadoopConfiguration
hadoopConf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem")
hadoopConf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem")
sc.hadoopConfiguration.set("fs.s3n.awsAccessKeyId", "myRealKeyId")
sc.hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", "realKey")
val sqlConext = new SQLContext(sc)
val datset = sc.textFile("s3n://altvissparkoutput/titlematching214/1.0/*/*")
val ebayRaw = sqlConext.read.json(datset)
val data = ebayRaw.first();
}
}
答案 0 :(得分:0)
您可以直接从s3
读取数据集。
val datset = "s3n://altvissparkoutput/titlematching214/1.0/*/*"
val ebayRaw = sqlConext.read.json(datset)