使用数组列保存数据框会导致org.elasticsearch.hadoop.serialization.EsHadoopSerializationException

时间:2019-04-26 21:09:25

标签: elasticsearch pyspark

我在列中有一个带有数组的数据框。 我想将此数据框保存到Elasticsearch。 但是保存数据框时执行器出现异常

如果没有列作为数组,则可以保存数据框。仅当任何列为数组/嵌套/ json格式时才会发生错误

代码如下:

from pyspark.sql.types import *
schema = StructType([  # schema
    StructField("id", StringType(), True),
    StructField("email", ArrayType(StringType()), True)])

df = spark.createDataFrame([{"id": "id3","email": ["email1@gmail.com"]},
                            {"id": "id4", "email": ["email1@gmail.com", "email2@gmail.com"]}],
                           schema=schema)


df.show(truncate=False)


df.write.format("org.elasticsearch.spark.sql") \
                .mode('append') \
                .option("es.input.json", "yes") \
                .option("es.resource", "test/test") \
                .option("es.nodes", "10.1.129.80:9200") \
                .option("es.net.ssl", "true") \
                .option("es.net.ssl.cert.allow.self.signed", "false") \
                .option("es.net.http.auth.user", "username") \
                .option("es.net.http.auth.pass", "password") \
                .option("es.net.ssl.truststore.location", "file:///Users/alfred/Documents/DS/bin/truststore.jks") \
                .option("es.net.ssl.truststore.pass", "xxxx") \
                .option("es.net.ssl.keystore.location", "file:///Users/alfred/Documents/DS/bin/keystore.jks") \
                .option("es.net.ssl.keystore.pass", "xxxx") \
                .option("es.net.ssl.protocol","TLS") \
                .option("es.write.operation", "upsert") \
        .option("es.spark.dataframe.write.null","true")\
                .save()

我使用以下命令执行了以下代码:

spark-submit --jars /Users/alfred/Documents/DS/bin/elasticsearch-spark-20_2.11-6.1.2.jar,/Users/alfred/Documents/DS/bin/scala-library-2.11.12.jar test_spark.py

我遇到错误:

19/04/27 00:48:37 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: [B@782e5ac0; line: 1, column: 2]
    at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)

有人可以帮助解决这个问题吗?

0 个答案:

没有答案