我需要使用scala基于列创建每行的地图,例如
sunny,hot,high,FALSE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
我希望输出为,
RDD[List(
Map(
'0 -> 'sunny,
'1 -> 'hot,
'2 -> 'high,
'3 -> 'false,
'4 -> 'no
),
Map(
'0 -> 'overcast,
'1 -> 'hot,
'2 -> 'high,
'3 -> 'false,
'4 -> 'yes
),
Map(
'0 -> 'rainy,
'1 -> 'mild,
'2 -> 'high,
'3 -> 'false,
'4 -> 'yes
)
)]
这里我们考虑每列,列号是键,列值是键值对中的值。
答案 0 :(得分:6)
val s = """sunny,hot,high,FALSE,no
|overcast,hot,high,FALSE,yes
|rainy,mild,high,FALSE,yes""".stripMargin
s.split("\n").map { line =>
line.split(",").zipWithIndex.map{ case (word, idx) => idx -> word}.toMap
}.toList
yields:
List(Map(0 -> sunny, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> no),
Map(0 -> overcast, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> yes),
Map(0 -> rainy, 1 -> mild, 2 -> high, 3 -> FALSE, 4 -> yes))
zipWithIndex'将'Seq映射到(值,索引)的元组
'Seq('a','b')。zipWithIndex'产生'Seq [(Char,Int)] = List((a,0),(b,1))'
我们可以将功能改进为:
s.split("\n").map { line =>
line.split(",").zipWithIndex.map(_.swap).toMap
}.toList
sc.textFile(<file-with-data>).map { line =>
line.split(",").zipWithIndex.map(_.swap).toMap
}
感谢@Paul