如何使用scala基于列为每行创建地图?

时间:2014-11-15 07:04:24

标签: scala

我需要使用scala基于列创建每行的地图,例如

sunny,hot,high,FALSE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes

我希望输出为,

RDD[List(
  Map(
    '0 -> 'sunny,
    '1 -> 'hot,
    '2 -> 'high,
    '3 -> 'false,
    '4 -> 'no
  ),
  Map(
    '0 -> 'overcast,
    '1 -> 'hot,
    '2 -> 'high,
    '3 -> 'false,
    '4 -> 'yes
  ),
  Map(
    '0 -> 'rainy,
    '1 -> 'mild,
    '2 -> 'high,
    '3 -> 'false,
    '4 -> 'yes
  )
)]

这里我们考虑每列,列号是键,列值是键值对中的值。

1 个答案:

答案 0 :(得分:6)

Plain Scala

val s = """sunny,hot,high,FALSE,no
          |overcast,hot,high,FALSE,yes
          |rainy,mild,high,FALSE,yes""".stripMargin


s.split("\n").map { line =>
  line.split(",").zipWithIndex.map{ case (word, idx) => idx -> word}.toMap
}.toList

yields:
List(Map(0 -> sunny, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> no), 
     Map(0 -> overcast, 1 -> hot, 2 -> high, 3 -> FALSE, 4 -> yes), 
     Map(0 -> rainy, 1 -> mild, 2 -> high, 3 -> FALSE, 4 -> yes))

  • split在分隔符上分割文字
  • zipWithIndex'将'Seq映射到(值,索引)的元组

    'Seq('a','b')。zipWithIndex'产生'Seq [(Char,Int)] = List((a,0),(b,1))'


我们可以将功能改进为:

s.split("\n").map { line =>
  line.split(",").zipWithIndex.map(_.swap).toMap
}.toList
  • 因为'zipWithIndex'的结果是Tuples,它具有函数swap所以我们不需要自己交换元素

对于Spark

sc.textFile(<file-with-data>).map { line =>
  line.split(",").zipWithIndex.map(_.swap).toMap
}

感谢@Paul