如何在Scala中对Seq [Array [String]]进行过滤?

时间:2017-02-09 01:44:42

标签: scala amazon-s3 filter

我试图让s3存储桶中存在数字文件。我有路径列表作为Seq,我试图检查。我试图过滤路径并计数,但不断收到错误。

import java.net.URI
import org.apache.hadoop.fs.FileSystem
import org.apache.hadoop.fs.Path

files: Seq[String] = Vector(s3://dv-service-prod-na/output/sample/test/data/2016/12/01/*/*, s3://dv-service-prod-na/output/sample/test/data/2016/12/02/*/*, s3://dv-service-prod-na/output/sample/test/data/2016/12/03/*/*, s3://dv-service-prod-na/output/sample/test/data/2016/12/04/*/*, s3://dv-service-prod-na/output/sample/test/data/2016/12/05/*/*)

val filePath = files.map(x=> x.split("/\\*/\\*"))
val input = "s3n://dv-service-prod-na"
val missingPath = filePath.filter(x => (FileSystem.get(new URI(input), sc.hadoopConfiguration).exists(new Path(x))).equals(false)).count

错误:

console>:92: error: overloaded method constructor Path with alternatives: (x$1: java.net.URI)org.apache.hadoop.fs.Path <and> (x$1: String)org.apache.hadoop.fs.Path cannot be applied to (Array[String])

1 个答案:

答案 0 :(得分:3)

您可能希望在拆分后展平:

val filePath = files.flatMap(x=> x.split("/\\*/\\*"))