我在列中有一个带有数组的数据框。 我想将此数据框保存到Elasticsearch。 但是保存数据框时执行器出现异常
如果没有列作为数组,则可以保存数据框。仅当任何列为数组/嵌套/ json格式时才会发生错误
代码如下:
from pyspark.sql.types import *
schema = StructType([ # schema
StructField("id", StringType(), True),
StructField("email", ArrayType(StringType()), True)])
df = spark.createDataFrame([{"id": "id3","email": ["email1@gmail.com"]},
{"id": "id4", "email": ["email1@gmail.com", "email2@gmail.com"]}],
schema=schema)
df.show(truncate=False)
df.write.format("org.elasticsearch.spark.sql") \
.mode('append') \
.option("es.input.json", "yes") \
.option("es.resource", "test/test") \
.option("es.nodes", "10.1.129.80:9200") \
.option("es.net.ssl", "true") \
.option("es.net.ssl.cert.allow.self.signed", "false") \
.option("es.net.http.auth.user", "username") \
.option("es.net.http.auth.pass", "password") \
.option("es.net.ssl.truststore.location", "file:///Users/alfred/Documents/DS/bin/truststore.jks") \
.option("es.net.ssl.truststore.pass", "xxxx") \
.option("es.net.ssl.keystore.location", "file:///Users/alfred/Documents/DS/bin/keystore.jks") \
.option("es.net.ssl.keystore.pass", "xxxx") \
.option("es.net.ssl.protocol","TLS") \
.option("es.write.operation", "upsert") \
.option("es.spark.dataframe.write.null","true")\
.save()
我使用以下命令执行了以下代码:
spark-submit --jars /Users/alfred/Documents/DS/bin/elasticsearch-spark-20_2.11-6.1.2.jar,/Users/alfred/Documents/DS/bin/scala-library-2.11.12.jar test_spark.py
我遇到错误:
19/04/27 00:48:37 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('(' (code 40)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
at [Source: [B@782e5ac0; line: 1, column: 2]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)
有人可以帮助解决这个问题吗?