Question

我正在尝试使用Azure数据工厂将json或csv文件从我的Blob存储中移动到我的Cosmos DB图形数据库中。

通过正确格式化json文件，我能够上传顶点，但是不知道如何创建边缘。将边缘硬编码到json文件中不起作用。这是我的json文件中的顶点之一：

{
"id": "o0001",
"label": "Order",
"type": "vertex",
"Product2": 1.0,
"Product3": 1.0,
"Product4": 1.0,
"Product5": 1.0,
"Product6": 1.0,
"Product7": 1.0,
"Product8": 1.0,
"Product24": 1.0,
"Product25": 1.0,
"Product26": 1.0,
"Product27": 1.0
}

这是一个优势：

{
"label": "purchased",
"type": "edge",
"inVLabel": "Product",
"outVLabel": "Order",
"inV": "Product2",
"outV": "o0001"
}

所有内容都作为顶点导入。有没有人知道如何同时上传顶点和边线？

Answer 1

您可以将json转换为数据框，然后执行以下步骤以在Cosmos DB中添加记录，其中DataFrame中的一行等于Cosmos中的1个顶点。

单击下面的链接和下载选项，然后选择uber.jar https://search.maven.org/artifact/com.microsoft.azure/azure-cosmosdb-spark_2.3.0_2.11/1.2.2/jar，然后添加您的依赖项。

spark-shell --master yarn --executor-cores 5 --executor-memory 10g --num-executors 10 --driver-memory 10g --jars“ path / to / jar / dependency / azure-cosmosdb- spark_2.3.0_2.11-1.2.2-uber.jar”-打包“ com.google.guava：guava：18.0，com.google.code.gson：gson：2.3.1，com.microsoft.azure：azure -documentdb：1.16.1“

这是相同的代码：

import org.apache.spark.sql.types._
import org.apache.spark.sql.Row

val data = Seq(
Row(2, "Abb"),
Row(4, "Bcc"),
Row(6, "Cdd")
)

val schema = List(
StructField("partitionKey", IntegerType, true),
StructField("name", StringType, true)
)

val DF = spark.createDataFrame(
spark.sparkContext.parallelize(data),
StructType(schema)
)

val writeConfig = Map("Endpoint" -> "https://*******.documents.azure.com:443/",
"Masterkey" -> "**************",
"Database" -> "db_name",
"Collection" -> "collection_name",
"Upsert" -> "true",
"query_pagesize" -> "100000",
"bulkimport"-> "true",
"WritingBatchSize"-> "1000",
"ConnectionMaxPoolSize"-> "100",
"partitionkeydefinition"-> "/partitionKey")

DF.write.format("com.microsoft.azure.cosmosdb.spark").mode("overwrite").options(writeConfig).save()

希望有帮助。

将json（或csv）文档上传到Azure数据工厂并在Cosmos DB中创建图

1 个答案: