如何使用数据迁移工具将具有嵌套集合的文档从SQL Server导入到DocumentDB中

时间:2018-02-16 20:42:18

标签: azure-cosmosdb

我正在尝试使用Cosmos DB数据迁移工具从SQL Server 2014导入到DocumentDB。以下是SELECT语句示例:

SELECT 
Sales.SalesOrderNumber AS [ID]
, Product.ProductName AS [Product.Name]
, Product.UnitPrice AS [Product.Price]
, Sales.SalesQuantity AS [Product.Quantity]
FROM ContosoRetailDW.dbo.FactOnlineSales AS Sales 
JOIN ContosoRetailDW.dbo.DimProduct AS Product ON Product.ProductKey = Sales.ProductKey
WHERE Sales.SalesOrderNumber IN ('20070326214955','20070220416329')
ORDER BY Sales.SalesOrderNumber;

以下是上述查询中的示例行集。我添加了产品。 DimProduct相关列的前缀,因为我希望Product是嵌套集合。

ID                   Product.Name                                     Product.Price         Product.Quantity
-------------------- ------------------------------------------------ --------------------- ----------------
20070207721039       MGS Hand Games women M400 Yellow                8.99                  1
20070207721039       Adventure Works 26" 720p LCD HDTV M140 Silver   469.97                1
20070326214955       Adventure Works 20\" Analog CRT TV E45 Brown    200                   1
20070326214955       Contoso 4G MP3 Player E400 Silver               59.99                 1

鉴于上面的示例行集,下面是我希望如何格式化JSON文档的示例:

[
  {
    "ID": "20070220416329",
    "Products": [
      {
        "ProductName": "Contoso Mini Battery Charger Kit E320 Silver",
        "Price": 24.99,
        "Quantity": 1
      },
      {
        "ProductName": "Adventure Works 26\" 720p LCD HDTV M140 Silver",
        "Price": 469.97,
        "Quantity": 1
      }
    ]
  },
  {
    "ID": "20070326214955",
    "Products": [
      {
        "ProductName": "Adventure Works 20\" Analog CRT TV E45 Brown",
        "Price": 200,
        "Quantity": 1
      },
      {
        "ProductName": "Contoso 4G MP3 Player E400 Silver",
        "Price": 59.99,
        "Quantity": 1
      }
    ]
  }
]

问题是每行都插入一个单独的文档,意思是(4)文档而不是(2),其中Product是嵌套文档而不是嵌套集合。

我如何完成我想要做的事情?

1 个答案:

答案 0 :(得分:1)

根据official document上的示例,嵌套分隔符属性用于在导入期间创建层次关系(子文档)。但是,似乎不支持生成阵列。

因此,我建议您查询SQL数据库中的数据并将其组装成您想要的JSON数据,然后写入JSON文件并直接通过数据迁移工具导入JSON文件。

查询数据python代码:

import pyodbc
import os
from os.path import join as pjoin
import json

cnxn = pyodbc.connect('DRIVER={ODBC Driver 13 for SQL Server};SERVER=***.database.windows.net;DATABASE=***;UID=***;PWD=***')

cursor = cnxn.cursor()

cursor.execute("select * from dbo.test")
rowList = cursor.fetchall()

我的样本json数据:

[{
    "name": "jay",
    "courses": [{
        "course": "maths",
        "score": 100
    }, {
        "course": "history",
        "score": 80
    }]
}, {
    "name": "peter",
    "courses": [{
        "course": "maths",
        "score": 100
    }, {
        "course": "history",
        "score": 80
    }]
}]

汇编json数据java代码:

        boolean flag = true;
        List list= new ArrayList();
        String nameIndex = "";

       JSONObject obj = new JSONObject();
        while(rs.next()){
            String name = rs.getString("name");
            String course = rs.getString("course");
            int score = rs.getInt("score");
            if(!name.equals(nameIndex)){
                if(!flag){
                   list.add(obj);
                   obj = new JSONObject();
                   flag = false;
                }
                obj.put("name",name);
                List cources= new ArrayList();
                JSONObject objSub = new JSONObject();
                objSub.put("course",course);
                objSub.put("score",score);
                courses.add(objSub);
                obj.put("courses",courses);

            }else{

                List courses= (List)obj.get("courses");
                JSONObject objSub = new JSONObject();
                objSub.put("course",course);
                objSub.put("score",score);
                cources.add(objSub);
                obj.put("cources",cources);
            }
        }
        return list;

将数据写入json.file:

name_emb = {'your json string'}
output_dir = 'E:/'
listdir = os.listdir(output_dir)
if 'test.json' in listdir:
   fr = open(pjoin(output_dir, 'test.json'), 'a')
   model = json.dumps(name_emb)
   fr.write(model)
   fr.close()

希望它对你有所帮助。