在Spark中使用wholeTextFiles创建一个新的RDD

时间:2015-12-01 20:19:40

标签: apache-spark

我正在尝试使用wholeTextFiles的spark来读取目录,我的文件RDD包含(String,String),其中第一个String是我的文件名,第二个是我的文件内容。

我想将此RDD映射到另一个只包含我文件内容的RDD,我该怎么办?

谢谢!

val file = sc.wholeTextFiles("./Desktop/093")

file.first
res0: (String, String) = 
(file:/Users/Desktop/093/nc-no-na.clusters.093.001.txt,"199 197 5   5   168 0   0.932125    11101111000000110100000000000000000000000000001010100000011100001000000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001101101111100000000000000000011100000000000000000000000000100000111011000000000000000000000000000000000000000000000000000000000000000011110010111001001110000000011100000000010000000000000000000000000010000000000000000000000000000000000000000011111111111101010111000000000000000000000000000000000000000000000000000000000000000001100000000000000000000000000000000000000000101110101110101011010000000000000000001100001100000011110000000000000000000011111011110011100...

1 个答案:

答案 0 :(得分:0)

例如:

import org.apache.spark.rdd.RDD

val content: RDD[String] = file.map(_._2)