如何使用scala在spark中合并多个DStream?

时间:2018-01-26 20:46:16

标签: scala apache-spark apache-kafka spark-streaming dstream

我有三个来自Kafka的传入流。我解析作为JSON接收的流并将它们提取到适当的case类并形成以下模式的DStream:

case class Class1(incident_id: String,
                  crt_object_id: String,
                  source: String,
                  order_number: String)

case class Class2(crt_object_id: String,
                  hangup_cause: String)

case class Class3(crt_object_id: String,
                  text: String)

我想基于公共列crt_object_id加入这三个DStream。所需的DStream应采用以下形式:

case class Merged(incident_id: String,
                  crt_object_id: String,
                  source: String,
                  order_number: String,
                  hangup_cause: String,
                  text: String)

请告诉我一种方法来做同样的事情。我对Spark和Scala都很陌生。

1 个答案:

答案 0 :(得分:2)

Spark Streaming documentation告诉您join方法的签名:

  

join(otherStream, [numTasks])

     

DStream(K, V)对的两个(K, W)上调用时,返回DStream(K, (V, W))DStream对,每个键的所有元素对

请注意,您需要case class Class1(incident_id: String, crt_object_id: String, source: String, order_number: String) case class Class2(crt_object_id: String, hangup_cause: String) case class Class3(crt_object_id: String, text: String) case class Merged(incident_id: String, crt_object_id: String, source: String, order_number: String, hangup_cause: String, text: String) val stream1: DStream[Class1] = ... val stream2: DStream[Class2] = ... val stream3: DStream[Class3] = ... val transformedStream1: DStream[(String, Class1)] = stream1.map { c1 => (c1.crt_object_id, c1) } val transformedStream2: DStream[(String, Class2)] = stream2.map { c2 => (c2.crt_object_id, c2) } val transformedStream3: DStream[(String, Class3)] = stream3.map { c3 => (c3.crt_object_id, c3) } val joined: DStream[(String, ((Class1, Class2), Class3))] = transformedStream1.join(transformedStream2).join(transformedStream3) val merged: DStream[Merged] = joined.map { case (crt_object_id, ((c1, c2), c3)) => Merged(c1.incident_id, crt_object_id, c1.source, c1.order_number, c2.hangup_cause, c3.text) } 个键值对而不是案例类。因此,您必须从案例类中提取要加入的字段,加入流并将生成的流打包到适当的案例类中。

<select>
       <option *ngFor="let option of optionList" 
    value="ALL"  #optionIst>{{option.name}}   
       </option>      
    </select>

     <select>
       <option *ngFor="let option of optionList" 
    value="ALL"  #optionSec>{{option.name}}   
       </option>      
    </select>
<button (click)="showData()">Show Data </button>