我有两个csv文件(数据集)file1和file2。
File1由以下列组成:
Orders | Requests | Book1 | Book2
Varchar| Integer | Integer| Integer
File2由以下列组成:
Book3 | Book4 | Book5 | Orders
String| String| Varchar| Varchar
如何将scala中两个CSV文件中的数据组合起来检查:
答案 0 :(得分:0)
你可以通过制作Pair RDD加入两个csv。
val rightFile = job.patch.get.file
val rightFileByKeys = sc.textFile(rightFile).map { line =>
new LineParser(line, job.patch.get.patchKeyIndex, job.delimRegex, Some(job.patch.get.patchValueIndex))
}.keyBy(_.getKey())
val leftFileByKeys = sc.textFile(leftFile).map { line =>
new LineParser(line, job.patch.get.fileKeyIndex, job.delimRegex)
}.keyBy(_.getKey())
leftFileByKeys.join(rightFileByKeys).map { case (key, (left, right)) =>
(job, left.line + job.delim + right.getValue())
}