我已经编写了以下代码来读取没有密码的zip文件,如下所示:
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("CSVProcessSPark"); //create a new spark config
val sc = new SparkContext(sparkConf)
sc.binaryFiles("hdfs://localhost:8020/user/cloudera/HWZ.zip", 1) //make an RDD from *.zip files in HDFS
.flatMap((file: (String, PortableDataStream)) => { //flatmap to unzip each file
val zipStream = new ZipInputStream(file._2.open)
//open a java.util.zip.ZipInputStream
val entry = zipStream.getNextEntry() //get the first entry in the stream
val iter = Source.fromInputStream(zipStream).getLines //place entry lines into an iterator
iter.next //pop off the iterator's first line
iter //return the iterator
})
.saveAsTextFile("hdfs://localhost:8020/user/cloudera/result.csv")
我尝试在spark上下文本地属性中设置密码,但我仍然无法读取受密码保护的zip文件。
请为我提供使用Apache Spark读取受密码保护的zip文件的解决方案。
提前致谢。