Bluemix上的Spark Context将json有效负载添加为null

时间:2016-08-22 14:40:19

标签: apache-spark ibm-cloud message-hub

我正在将消息从Message Hub传输到Bluemix中的Spark实例。我正在使用Java客户端向Message Hub添加一条简单的json消息。

JSON消息 -

{"country":"Netherlands","dma_code":"0","timezone":"Europe\/Amsterdam","area_code":"0","ip":"46.19.37.108","asn":"AS196752","continent_code":"EU","isp":"Tilaa V.O.F.","longitude":5.75,"latitude":52.5,"country_code":"NL","country_code3":"NLD"}

当我在Spark中开始流式传输时,我收到的消息在开头有一个额外的空值。

(null,{"country":"Netherlands","dma_code":"0","timezone":"Europe\/Amsterdam","area_code":"0","ip":"46.19.37.108","asn":"AS196752","continent_code":"EU","isp":"Tilaa V.O.F.","longitude":5.75,"latitude":52.5,"country_code":"NL","country_code3":"NLD"})

请让我知道为什么Spark上下文会将此null置于前面。我该如何删除它?

KafkaSender代码 -

  KafkaProducer<String, String> kafkaProducer;
  kafkaProducer = new KafkaProducer<String, String>(props);
  ProducerRecord<String, String> producerRecord = new ProducerRecord<String, String>(topic,message);

  RecordMetadata recordMetadata = kafkaProducer.send(producerRecord).get();
  //getting RecordMetadata is possible to validate topic, partition and offset
  System.out.println("topic where message is published : " + recordMetadata.topic());
  System.out.println("partition where message is published : " + recordMetadata.partition());
  System.out.println("message offset # : " + recordMetadata.offset());
  kafkaProducer.close();

由于 拉吉

1 个答案:

答案 0 :(得分:0)

您的密钥为空 - 第一个值是您的密钥,第二个值是您的价值。

我建议您发布将消息发布到Kafka / MessageHub的代码以获得更好的答案。

要解决您的问题 - 如果您的目标只是将其打印出来,您可以执行类似的操作,这会将数据打印到stdout并忽略null键。

stream.foreachRDD(recordRDD => {
  recordRDD.foreach(record => print(record._2))
})