将数据推送到PySpark以确保ElasticSearch安全-证书颁发

时间:2018-10-05 12:00:06

标签: apache-spark hadoop elasticsearch hdfs yarn

我有一个启用SearchGuard的ElasticSearch集群。我正在尝试使用Spark将数据推送到ElasticSearch中。

OS-CentOS7 ElasticSearch版本-6.4.1 火花-2.3.0 Java-openjdk-1.8.0 纱线-2.7.3 HDFS-2.7.3 HDP-2.6.5.0

ElasticSearch已通过PEM密钥由SearchGuard保护。 chain-ca.pem已添加到所有spark节点上的信任库中。我已经在我的PySpark代码上添加了所需的配置。

es_write_conf = {
    "es.nodes" : "esm1,esm2,esm3",
    "es.port" : "9200",
    "es.resource" : str(topic+"_"+year_week+"/"+topic),
    "es.input.json": "true",
    "es.nodes.ingest.only": "true",
    "es.net.http.auth.user": "admin",
    "es.net.http.auth.pass": "admin",
    "es.net.ssl":"true",
    "es.net.ssl.cert.allow.self.signed":"true",
    "es.net.ssl.keystore.location":"file:///usr/lib/jvm/java-1.8.0-openjdk-1.8.0.181-3.b13.el7_5.x86_64/jre/lib/security/cacerts",
    "es.net.ssl.keystore.pass":"changeit"
}

我在hdfs用户上使用spark-submit运行了此

spark-submit --master local[4] --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

产生此错误。

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

这是通过使用sudo运行spark提交解决的。我以前曾更改过cacerts文件的权限和所有权,但它产生了相同的错误。

我正在尝试通过YARN运行它,并得到相同的错误。

通过以下方式运行它:

spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/code/ingestion.py demo_machine001

产生错误:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

通过以下方式运行它:

sudo SPARK_HOME=/usr/hdp/current/spark2-client SPARK_MAJOR_VERSION=2 PYTHONPATH=$SPARK_HOME/python/:$PYTHONPATH PYTHONPATH=$SPARK_HOME/python/lib/py4j-0.10.6-src.zip:$PYTHONPATH spark-submit --master yarn --deploy-mode cluster --jars /home/hdfs/jars/elasticsearch-hadoop-6.4.1.jar,/home/hdfs/jars/spark-streaming-kafka-0-8-assembly_2.11-2.3.1.jar /home/hdfs/everlytics/ingestion.py demo_machine001

产生错误:

Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

我已将cacerts文件放入hdfs,并在spark代码中进行了更改:

"es.net.ssl.keystore.location":"hdfs://spm1:8020/certificates/cacerts"

这产生了一个错误:

Caused by: org.elasticsearch.hadoop.EsHadoopIllegalStateException: Cannot initialize SSL - Expected to find keystore file at [hdfs://spm1:8020/certificates/cacerts] but was unable to. Make sure that it is available on the classpath, or if not, that you have specified a valid URI.

我对hdfs和本地文件系统上的cacert的权限和所有权进行了多次迭代,但无济于事。

我还已将cacerts复制到每个节点上的/ tmp以进行全局访问,但到目前为止没有任何效果。

0 个答案:

没有答案