Spark结构化流式传输Elasticsearch集成问题。数据源不支持流式写入

时间:2017-09-26 07:28:23

标签: apache-spark-sql spark-streaming

我正在编写一个Spark结构化流应用程序,其中使用Spark处理的数据需要沉入弹性搜索。

这是我的开发环境,因此我有一个独立的弹性搜索。

我尝试过以下两种方法将DataSet中的数据接收到ES。

1。ds.writeStream().format("org.elasticsearch.spark.sql").start("spark/orders"); 2。ds.writeStream().format("es").start("spark/orders");

在这两种情况下,我都收到以下错误:

引起:

java.lang.UnsupportedOperationException: Data source es does not support streamed writing
at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:287) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:272) ~[spark-sql_2.11-2.1.1.jar:2.1.1]
at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:213) ~[spark-sql_2.11-2.1.1.jar:2.1.1]

的pom.xml:

    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>

    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>

    </dependency>
    <dependency>
        <groupId>org.mongodb.spark</groupId>
        <artifactId>mongo-spark-connector_2.11</artifactId>
        <version>2.0.0</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming_2.11</artifactId>
        <version>2.1.1</version>
        <scope>provided</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-streaming-kafka_2.11</artifactId>
        <version>1.6.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
        <version>2.1.1</version>
    </dependency>

    <dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-spark-20_2.11</artifactId>
        <version>5.6.1</version>
    </dependency>


Appreciate any help in resolving this issue.

3 个答案:

答案 0 :(得分:1)

你可以尝试

 ds.write.format("org.elasticsearch.spark.sql").option("es.resource",ES_INDEX+"/"+ES_TYPE).option("es.mapping.id",ES_ID).mode("overwrite").save()

答案 1 :(得分:1)

Elasticsearch sink不支持流式写入,这意味着您无法将输出流式传输到Elasticsearch。 您可以将流输出写入kafka并使用logstash从kafka读取到elasticsearch。

答案 2 :(得分:0)

<强>更新

使用Spark 2.2.0时,版本Elasticsearch 6.x现在支持流式写入。

相关性:

selocal

writeStream代码:

<dependency>
    <groupId>org.elasticsearch</groupId>
    <artifactId>elasticsearch-spark-20_2.11</artifactId>
    <version>6.2.4</version>
</dependency>