如何使用Azure Cosmos DB连接器从流查询中写入CosmosDB?

时间:2019-11-20 13:27:18

标签: scala apache-spark azure-cosmosdb spark-structured-streaming azure-databricks

我有一个简单的结构化流应用程序,输出接收器应为CosmosDB。当我调用writeStream方法时,弹出以下错误。添加到集群的库的版本为:

com.microsoft.azure:azure-cosmosdb-spark_2.4.0_2.11:1.4.1, type:Maven

我的代码如下:

val outstream = staticInputDF 
  .writeStream
  .format(classOf[CosmosDBSinkProvider].getName)
  .options(config)
  .start
  .awaitTermination

这会导致错误:

  

command-751666472135258:74:错误:方法值选项重载   以及其他选择:(选项:   java.util.Map [String,String])org.apache.spark.sql.streaming.DataStreamWriter [org.apache.spark.sql.Row]    (选项:   scala.collection.Map [String,String])org.apache.spark.sql.streaming.DataStreamWriter [org.apache.spark.sql.Row]   不适用于   (com.microsoft.azure.cosmosdb.spark.config.Config)

如何从流数据帧写入一个CosmosDB集合?

2 个答案:

答案 0 :(得分:0)

以下代码显示了如何将数据帧写入Cosmos DB。

// Write configuration

val writeConfig = Config(Map(
  "Endpoint" -> "https://doctorwho.documents.azure.com:443/",
  "Masterkey" -> "YOUR-KEY-HERE",
  "Database" -> "DepartureDelays",
  "Collection" -> "flights_fromsea",
  "Upsert" -> "true",
  "WritingBatchSize" -> "500",
  "CheckpointLocation" -> "/checkpointlocation_write1"
))

// Write to Cosmos DB from the flights DataFrame
df
.writeStream
.format(classOf[CosmosDBSinkProvider].getName)
.options(writeConfig)
.start()

参考: Azure Databricks Spark Connecter

希望这会有所帮助。

答案 1 :(得分:0)

该错误表明configcom.microsoft.azure.cosmosdb.spark.config.Config类型的,但是您只能将.options(config)java.util.Map[String,String]scala.collection.Map[String,String]一起使用。

Stream data to from Kafka to Cosmos DB笔记本中使用以下Map的地方签出:

val configMap = Map(
    "Endpoint" -> "YOUR_COSMOSDB_ENDPOINT",
    "Masterkey" -> "YOUR_MASTER_KEY",
    "Database" -> "kafkadata",
    // use a ';' to delimit multiple regions
    "PreferredRegions" -> "West US;",
    "Collection" -> "kafkacollection"
)