Question

我对弹性搜索很新。我正在使用elasticsearch-hadoop 6.2.4版本，我正在从HDFS读取文件，转换为bean对象并写入弹性搜索。我正在使用Spark Structured流媒体。

StreamingQuery query = dataSet
                        .writeStream()
                        .format("org.elasticsearch.spark.sql")
                        //.outputMode(OutputMode.Append())
                        .option("checkpointLocation", "\tmp\ckpt1")
                        .option("es.nodes","abc.dev.cm.par.xy.hp")
                        .option("es.port","9200")
                        .option("es.mapping.id", "CustomerID")
                        .option("es.resource", "testIndex/testType")
                        .start();

写作时，我将pojo类中的一个字段（CustomerID）作为映射iD。我们可以将多个字段或字段组合作为映射ID吗？例如，我的文件包含客户ID和订单ID字段。我们可以将这两个字段组合为 CustomerID + OrderID 之类的东西吗？

Answer 1

否，您不能将多个属性设置为 “ es.mapping.id” 。您可以做的就是，将您想要的任何复合ID创建并添加到Dataframe并使用它。

Answer 2

根据Elastic Documentation；映射ID选项为1列名，因此；您不能将多个列设置为ID。但是您可以通过使用以下值创建新列来解决此问题：

dataSet.withColumn('id', CustomerID + OrderID)

Answer 3

或者您可以使用sha2函数，在连接多列之后生成哈希ID。

弹性搜索 - 多个字段作为Spark中的映射ID

3 个答案: