Scala数组到字符串,映射[字符串,任意]

时间:2017-03-07 20:31:50

标签: arrays scala apache-spark rdd

我有一个scala数组" visitedArray" ,其值如下:

Array(
    (Map(url -> http://www.tumblr.com/tagged/abc), Map(visited -> true)), 
    (Map(url -> http://www.tumblr.com/tagged/random-blog), Map(visited -> true)), 
    (Map(url -> http://www.livestream.com/forum/1),Map(visited -> false))
    ....

但是,我想将其转换为String,Map [String,Any],并希望结果显示为:

(
    (http://www.tumblr.com/tagged/kate-beckett, Map(visited -> true),  
    (http://www.tumblr.com/tagged/random-blog), Map(visited -> true)
    ....

我试过了:

val testRdd = sc.parallelize(visitedArray)
val formatedRdd = testRdd.map(t => (t._1("url"), t._2))

但是,它不会返回所需的格式。它返回:

Array(
    (http://www.tumblr.com/tagged/kate-beckett, Map(visited -> true),  
    (http://www.tumblr.com/tagged/random-blog), Map(visited -> true)
    ....

如何在不使用数组()的情况下实现我想要的效果(转换为String,Map [String,Any]?

1 个答案:

答案 0 :(得分:0)

如果我理解正确,你想要这个

  val a = Array(
    (Map("url" -> "http://www.tumblr.com/tagged/abc"), Map("visited" -> true)), 
    (Map("url" -> "http://www.tumblr.com/tagged/random-blog"), Map("visited" -> true)), 
    (Map("url" -> "http://www.livestream.com/forum/1"),Map("visited" -> false)))

  a.map {  
    case (m1: Map[String, String], m2: Map[String, Boolean]) =>
      (m1("url"), m2)
  }

这导致了这个

Array(
  ("http://www.tumblr.com/tagged/abc", Map("visited" -> true)),
  ("http://www.tumblr.com/tagged/random-blog", Map("visited" -> true)),
  ("http://www.livestream.com/forum/1", Map("visited" -> false))
): Array[(String, Map[String, Boolean])]

然后你可以sc.parallelize那个

您只在开头看到Array,因为这是Scala打印对象的方式。它实际上并不是数据的一部分"

例如,使用List

  a.map {  
    case (m1: Map[String, String], m2: Map[String, Boolean]) =>
      (m1("url"), m2)
  } toList

List(
  ("http://www.tumblr.com/tagged/abc", Map("visited" -> true)),
  ("http://www.tumblr.com/tagged/random-blog", Map("visited" -> true)),
  ("http://www.livestream.com/forum/1", Map("visited" -> false))
): scala.package.List[(String, Map[String, Boolean])]