我正在尝试使用wholeTextfiles并从数据中获取一个配对的RDD但是因为我是新的我对此有点困惑: 这是代码:
val wholefiles = sc.wholeTextFiles("sqoop_import/orders")
wholefiles: org.apache.spark.rdd.RDD[(String, String)] = sqoop_import/orders MapPartitionsRDD[72] at wholeTextFiles at <console>:27
wholefiles.take(5).foreach(println)
(hdfs://filename, 1, 2013-07-25 00:00:00.0,11599,CLOSED
2,2013-07-25 00:00:00.0,256,PENDING_PAYMENT
3,2013-07-25 00:00:00.0,12111,COMPLETE
4,2013-07-25 00:00:00.0,8827,CLOSED)
如何从上述数据中获得与column4和column1配对的RDD?
答案 0 :(得分:1)
您可以使用以下代码 -
wholeTextFiles.map(record=>record._2)
.map(lines=>lines.split("\n"))
.flatMap(lines=>lines)
.map(line=>line.split(","))
.map(fields=>(fields(3),fields(0)))
.collect()
我希望它有所帮助。