我正在编写一个Spark结构化流应用程序,其中使用Spark处理的数据需要沉入弹性搜索。
这是我的开发环境,因此我有一个独立的弹性搜索。
我尝试过以下两种方法将DataSet中的数据接收到ES。
1。ds.writeStream().format("org.elasticsearch.spark.sql").start("spark/orders");
2。ds.writeStream().format("es").start("spark/orders");
在这两种情况下,我都收到以下错误:
引起:
java.lang.UnsupportedOperationException: Data source es does not support streamed writing
at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:287) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:272) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:213) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-hive_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql-kafka-0-10_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>5.6.1</version>
</dependency>
Appreciate any help in resolving this issue.
答案 0 :(得分:1)
你可以尝试
ds.write.format("org.elasticsearch.spark.sql").option("es.resource",ES_INDEX+"/"+ES_TYPE).option("es.mapping.id",ES_ID).mode("overwrite").save()
答案 1 :(得分:1)
Elasticsearch sink不支持流式写入,这意味着您无法将输出流式传输到Elasticsearch。 您可以将流输出写入kafka并使用logstash从kafka读取到elasticsearch。
答案 2 :(得分:0)
<强>更新强>:
使用Spark 2.2.0时,版本Elasticsearch 6.x现在支持流式写入。
相关性:
selocal
writeStream代码:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>6.2.4</version>
</dependency>