在安全群集中使用Nifi

时间:2016-10-11 23:13:11

标签: java serialization spark-streaming apache-nifi

我正在尝试使用spark streaming来从安全集群中的Nifi读取数据。 我通过在SiteToSiteClient中添加SSLContext来使用SSLContext进行身份验证,但SSLContext不可序列化。
我的代码如下所示:

      def main(args: Array[String]) {
        val pKeyFile = new java.io.File("/path/to/file-cert.pfx")
        val pKeyPassword = "password"
        val keyStore = java.security.KeyStore.getInstance("JKS")

        val kmf = javax.net.ssl.KeyManagerFactory.getInstance(javax.net.ssl.KeyManagerFactory.getDefaultAlgorithm())
        val keyInput = new java.io.FileInputStream(pKeyFile)
        keyStore.load(keyInput, pKeyPassword.toCharArray())
        keyInput.close()
        kmf.init(keyStore, pKeyPassword.toCharArray())

        val sslContext = javax.net.ssl.SSLContext.getInstance("SSL")
        sslContext.init(kmf.getKeyManagers(), null, new java.security.SecureRandom())

        val conf = new SiteToSiteClient
          .Builder()
          .sslContext(sslContext)
          .url("https://urlOfNifi:9090/nifi/")
          .portName("Spark_Test")
          .buildConfig()


        val config = new SparkConf().setAppName("Nifi_Spark_Data")
        val sc = new SparkContext(config)   
        val ssc = new StreamingContext(sc, Seconds(10))

        val lines = ssc.receiverStream(new NiFiReceiver(conf, StorageLevel.MEMORY_ONLY))

        val text = lines.map(dataPacket => new String(dataPacket.getContent, StandardCharsets.UTF_8))

        text.print()
        ssc.start()
        ssc.awaitTermination()  
   }
}

我想要做的是从nifi获取流数据但是当我启动我的spark流应用程序时,我得到以下错误:

 Exception during serialization: java.io.NotSerializableException: javax.net.ssl.SSLContext
Serialization stack:
        - object not serializable (class: javax.net.ssl.SSLContext, value: javax.net.ssl.SSLContext@2181e104)
        - field (class: org.apache.nifi.remote.client.SiteToSiteClient$StandardSiteToSiteClientConfig, name: sslContext, type: class javax.net.ssl.SSLContext)
        - object (class org.apache.nifi.remote.client.SiteToSiteClient$StandardSiteToSiteClientConfig, org.apache.nifi.remote.client.SiteToSiteClient$StandardSiteToSiteClientConfig@5a0d6057)
        - field (class: org.apache.nifi.spark.NiFiReceiver, name: clientConfig, type: interface org.apache.nifi.remote.client.SiteToSiteClientConfig)
        - object (class org.apache.nifi.spark.NiFiReceiver, org.apache.nifi.spark.NiFiReceiver@224fb09a)
        - element of array (index: 0)
        - array (class [Lorg.apache.spark.streaming.receiver.Receiver;, size 1)
        - field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
        - object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(org.apache.nifi.spark.NiFiReceiver@224fb09a))
        - writeObject data (class: org.apache.spark.rdd.ParallelCollectionPartition)
        - object (class org.apache.spark.rdd.ParallelCollectionPartition, org.apache.spark.rdd.ParallelCollectionPartition@87d)
        - field (class: org.apache.spark.scheduler.ResultTask, name: partition, type: interface org.apache.spark.Partition)
        - object (class org.apache.spark.scheduler.ResultTask, ResultTask(12, 0))
        at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1283)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1271)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1270)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1270)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:697)
        at scala.Option.foreach(Option.scala:236)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:697)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1496)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1458)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1447)

SiteToSiteClientCofig似乎可以被序列化,而其中的SSLContext则不是。在sparkStreaming中,将在其他节点中使用的对象应该是可序列化的,但我找不到使SSLContext可序列化的方法。有没有办法运行火花流以在安全集群中接收Nifi流数据?

提前谢谢。

1 个答案:

答案 0 :(得分:3)

您应该能够在SiteToSiteClient.Builder上调用以下方法,而不是提前创建SSLContext:

keystoreFilename(...)
keystorePass(...)
keystoreType(...)
truststoreFilename(...)
truststorePass(...)
truststoreType(...)

通过这样做,NiFiReceiver将在从序列化后的SiteToSiteClientConfig构建SiteToSiteClient时创建SSLContext。

请注意,这可能要求将密钥库/信任库放在运行Spark流式处理作业的所有节点上,并放在每个节点上的相同位置。