Scala展平嵌入的列表列表

时间:2017-07-15 17:30:54

标签: scala list spark-streaming

我创建了一个Twitter数据流,它以下面的格式显示主题标签,作者和提到的用户。

(List(timetofly, hellocake),Shera_Eyra,List(blxcknicotine, kimtheskimm))

由于嵌入式列表,我无法对此格式进行分析。如何创建另一个以此格式显示数据的数据流?

timetofly, Shera_Eyra, blxcknicotine timetofly, Shera_Eyra, kimtheskimm hellocake, Shera_Eyra, blxcknicotine hellocake, Shera_Eyra, kimtheskimm

以下是我生成数据的代码:

 val sparkConf = new SparkConf().setAppName("TwitterPopularTags")
 val ssc = new StreamingContext(sparkConf, Seconds(sampleInterval)) 
 val stream = TwitterUtils.createStream(ssc, None) 
 val data = stream.map {line => 
        (line.getHashtagEntities.map(_.getText),
        line.getUser().getScreenName(),
        line.getUserMentionEntities.map(_.getScreenName).toList)
  }

2 个答案:

答案 0 :(得分:1)

在您的代码段中,dataDStream[(Array[String], String, List[String])]。要获得所需格式的DStream[String],您可以使用flatMapmap

val data = stream.map { line =>
  (line.getHashtagEntities.map(_.getText),
   line.getUser().getScreenName(),
   line.getUserMentionEntities.map(_.getScreenName).toList)
}

val data2 = data.flatMap(a => a._1.flatMap(b => a._3.map(c => (b, a._2, c))))
                .map { case (hash, user, mention) => s"$hash, $user, $mention" }

flatMap导致DStream[(String, String, String)],其中每个元组由散列标签实体,用户和提及实体组成。随后使用模式匹配调用map会创建一个DStream[String],其中每个String由每个元组中的元素组成,以逗号和空格分隔。

答案 1 :(得分:0)

我会用它来理解:

  val data = (List("timetofly", "hellocake"), "Shera_Eyra", List("blxcknicotine", "kimtheskimm"))

  val result = for {
    hashtag <- data._1
    user = data._2
    mentionedUser <- data._3
  } yield (hashtag, user, mentionedUser)

  result.foreach(println)

输出:

(timetofly,Shera_Eyra,blxcknicotine)
(timetofly,Shera_Eyra,kimtheskimm)
(hellocake,Shera_Eyra,blxcknicotine)
(hellocake,Shera_Eyra,kimtheskimm)

如果您更喜欢 seq的字符串列表,而不是 seq的字符串元组,那么请更改yield以给您一个列表:{{ 1}}