Question

我希望使用Azure Cosmos DB Graph-API快速插入多个顶点。大多数当前的Microsoft示例逐个创建顶点并为每个顶点执行Gremlin查询，如下所示：

IDocumentQuery<dynamic> query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'thomas').property('name', 'Thomas').property('age', 44)");

while (query.HasMoreResults)
{                    
    foreach (dynamic result in await query.ExecuteNextAsync())  {   
        Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}"); 
    }
    Console.WriteLine();
}


query = client.CreateGremlinQuery<dynamic>(graph, "g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)");

while (query.HasMoreResults)
{                    
    foreach (dynamic result in await query.ExecuteNextAsync())  {   
        Console.WriteLine($"\t {JsonConvert.SerializeObject(result)}"); 
    }
    Console.WriteLine();
}

然而，当我想创建几千个顶点和边缘以初始填充图形时，这不太理想，因为这可能需要一些时间。

这是使用Microsoft.Azure.Graphs库v0.2.0-preview

如何有效地将多个顶点一次添加到Cosmos DB中，以便稍后使用Graph API语法进行查询？

Answer 1

我发现种子图表的最快方法实际上就是使用Document API。利用这种技术，我能够在一台开发机器上每秒插入5500个顶点/边缘。诀窍是理解Cosmos对边缘和顶点的期望格式。只需通过gremlin API在图表中添加几个顶点和边，然后通过转到Azure中的Data Explorer并执行SELECT * FROM c的文档查询来检查这些文档的格式。

在工作中，我构建了一个轻型ORM，它使用反射来获取边和顶点的POCO，并将它们转换为您在门户中看到的格式。我希望尽快开源，此时我很可能会发布一个Nuget包和随附的博客文章。希望在此期间，这将有助于指明您正确的方向，如果您对此方法有更多疑问，请与我们联系。

Answer 2

假设CosmosDB符合100％TinkerPop并且取决于gremlin executor timeout设置，您应该能够更新gremlin脚本以一次执行多个操作。

例如：

g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39)

可以转化为：

g.addV('person').property('id', 'mary').property('name', 'Mary').property('lastName', 'Andersen').property('age', 39); g.addV('person').property('id', 'david').property('name', 'David').property('lastName', 'P').property('age', 24);

等等。

你的gremlin脚本也只是Groovy代码，所以你甚至可以编写循环，什么不能创建顶点，追加属性等。

Answer 3

我们需要一个工具来帮助我们将数据迁移到cosmosdb图，但是由于没有可用的东西，我最终创建了这个-https://github.com/abbasc52/graphdb-migration-tool

您可以使用它从某些sql或json中获取数据，对其进行转换并将其推送到图形数据库。它支持gremlin查询的并行执行，因此非常快。
默认情况下，它会并行触发10个gremlin查询，但您可以通过在graph-config文件中传递batchSize来增加它。

Answer 4

数据迁移工具可能支持SQL API或MongoDB方案，但在此阶段它不支持图形api Vertex-Edges。如前所述，您可以使用生成的图形查询结果作为主要参考模式，然后在源上执行一些搜索和替换...以最终格式正确...虽然我发现只是运行控制台应用程序流数据可能是更充足。我能够重复使用与Marvel以及机场航班场景相同的控制台应用程序，我所需要做的就是每次都修改几行代码。代码以2个序列运行。第一个块提取并转换顶点。第二个序列提取并将字段关系转换为边缘。我需要修改的是我需要提取的字段。这可能需要一些时间来转换，具体取决于数据的大小，虽然它每次都给我精确的预期结果，而不必在源头不断修改数据。

Answer 5

我正在使用此代码通过NodeJS向上插入多个顶点

const __ = gremlin.process.statics;
let trt = await g.withBulk(true).V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
        .V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))
        .next()
        
// if you wanna add alot , using loop 

let trt = await g.withBulk(true)
trt = trt.V('test-3').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 4), __.addV('truongtest').property(gremlin.process.t.id, 'test-3').property(gremlin.process.cardinality.single, 'runways', 4))
        
trt = trt.V('test-10').fold().coalesce(__.unfold().property(gremlin.process.cardinality.single, 'runways', 100), __.addV('truongtest').property(gremlin.process.t.id, 'test-10').property(gremlin.process.cardinality.single, 'runways', 100))

// after done run next()
trt.next()

使用Cosmos DB Graph-API一次插入多个顶点

5 个答案: