如何从kafka主题数据中创建键值RDD

时间:2017-03-22 09:59:47

标签: scala spark-streaming

我正在从火花流媒体工作中的kafka主题中读取数据。我需要从数据中创建关键值RDD。

val messages = KafkaUtils.createStream(streamingContext, "localhost:2181","abc",topics, StorageLevel.MEMORY_ONLY)
messages.print()

  create key value RDD out of CustomerId and Tokens
  val xactionByCustomer = messages.map(_._2).map {
    transaction =>
      val key = transaction.customerId
      var tokens = transaction.tokens
      (key, tokens)
  }

错误::

[error] /home/ec2-user/alok/marseille/src/main/scala/com/jcalc/feed/MarkovPredictor.scala:115: value customerId is not a member of String
[error]       val key = transaction.customerId
[error]                             ^
[error] /home/ec2-user/alok/marseille/src/main/scala/com/jcalc/feed/MarkovPredictor.scala:116: value tokens is not a member of String
[error]       var tokens = transaction.tokens
[error]                                ^
[error] two errors found
[error] (compile:compileIncremental) Compilation failed

示例数据::

(null,W3Q6TF3CCI,X84N230CIH,NNN)
(null,O8IV7KEXT0,G1D590G05V,NNS)
(null,LBQKYNE081,MYU0O7JC5H,NHN)
(null,SRB4P501SW,E0FTI4RN7X,LHL)
(null,HELRFMAXVS,W6F704TN21,LHN)
(null,FS4PLQLI63,TK1O9YHS15,NNN)
(null,KI70UDVJLC,4ANBDAW7SU,LNN)
(null,IP6IVPGCWQ,MD93GGGBKA,NNN)
(null,976N9RPXSP,JKU0SV7UMH,LNL)
(null,J4V3AB1YVT,J9WXC1BRAY,LHN)

我对第二&仅对于RDD对的第4个值。 任何帮助?

1 个答案:

答案 0 :(得分:0)

您的数据看起来像元组:(String, String, String, String),因为您对2dn&第四个值映射:

val xactionByCustomer = messages.map(row => (row._2, row._4))

应该足够了。