Question

假设我具有以下数据结构：

public class Account
{
    public int AccountID { get; set; }
    public string Name { get; set; }
}

public class Person
{
    public int PersonID { get; set; }
    public string Name { get; set; }
    public List<Account> Accounts { get; set; }
}

我想使用数据工厂将数据从SQL Server数据库移动到Azure Cosmos DB。对于每个人，我想创建一个将帐户作为嵌套对象的json文件，如下所示：

"PersonID": 1,
"Name": "Jim",
"Accounts": [{
    "AccountID": 1,
    "PersonID": 1,
    "Name": "Home"
},
{
    "AccountID": 2,
    "PersonID": 1,
    "Name": "Work"
}]

我编写了一个存储过程来检索我的数据。为了将帐户包含为嵌套对象，我将SQL查询的结果转换为json：

select (select *
from Person p join Account Accounts on Accounts.PersonID = p.PersonID
for json auto) as JsonResult

不幸的是，我的数据被复制到单个字段中，而不是正确的对象结构中：

有人知道我该怎么做吗？

修改这里有一个类似的问题，但我找不到一个好的答案： Is there a way to insert a document with a nested array in Azure Data Factory?

Answer 1

对于处于相同情况的任何人，我最终都编写了一个.net应用程序，以从数据库中读取条目并使用SQL API进行导入。

https://docs.microsoft.com/en-us/azure/cosmos-db/create-sql-api-dotnet

该方法对于大型导入来说并不慢，因为它必须序列化每个对象然后分别导入它们。我稍后发现的一种更快的方法是使用大容量执行程序库，该库允许您批量导入json而无需先进行序列化：

https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started

https://docs.microsoft.com/en-us/azure/cosmos-db/bulk-executor-overview

修改

在安装NuGet软件包Microsoft.Azure.CosmosDB.BulkExecutor之后：

var documentClient = new DocumentClient(new Uri(connectionConfig.Uri), connectionConfig.Key);
var dataCollection = documentClient.CreateDocumentCollectionQuery(UriFactory.CreateDatabaseUri(database))
    .Where(c => c.Id == collection)
    .AsEnumerable()
    .FirstOrDefault();

var bulkExecutor = new BulkExecutor(documentClient, dataCollection);
await bulkExecutor.InitializeAsync();

然后导入文档：

var response = await client.BulkIMportAsync(docunemts);

使用数据工厂将嵌套对象从SQL Server复制到Azure CosmosDB

1 个答案: