Question

我在Azure SQL Server数据库中有一个表，它只是一个nvarchar(max)列，其中包含JSON字符串，如下所示，每行一行展平为一行（约200,000行）

{
    "username": "George84",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "84"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}

每条记录都有一个用户名和一个包含一个或多个条目的项目数组。 items数组中的每个条目都包含一个类型字段（整数）和一个得分字段（字符串）。用户名是唯一的。多个用户可能具有相同类型的项目，这些重复项可能具有相同或不同的分数。

我想将其转换为一组新的正确规范化的表，其架构如下所示：

我知道我可以使用外部应用程序来完成它，但我希望利用SQL Server 2016的new JSON processing capability完全在数据库/ TSQL中完成。

完成此转换的最佳方法是什么？

Answer 1

这是我第一次使用Json和T-sql，但是我认为它很有趣，所以我认为可能会有很多更好的解决方案，无论如何，让我们开始吧。

首先，我需要一些数据作为开始，因此我将生成一些临时数据，如下所示：-

declare @jsonData Table (jsonText nvarchar(max))

insert into @jsonData(jsonText)values(N'{
    "username": "George84",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "84"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George85",
    "items": [{
            "type": 8625,
            "score": "80"
        }, {
            "type": 8418,
            "score": "90"
        }, {
            "type": 7818,
            "score": "70"
        }, {
            "type": 232,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George86",
    "items": [{
            "type": 8626,
            "score": "80"
        }, {
            "type": 8418,
            "score": "70"
        }, {
            "type": 7818,
            "score": "90"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}'),(N'{
    "username": "George87",
    "items": [{
            "type": 8625,
            "score": "90"
        }, {
            "type": 8418,
            "score": "70"
        }, {
            "type": 7818,
            "score": "60"
        }, {
            "type": 23,
            "score": "q"
        }
    ]
}')

这会给我一个模拟您的主数据表的表。

让我们创建一些内存表来存储数据，如下所示：

declare @Users Table (ID int not null IDENTITY(1, 1),username nvarchar(50))
declare @Types Table (ID int not null IDENTITY(1, 1),[type] int)
declare @Scores Table (ID int not null IDENTITY(1, 1),score nvarchar(50))
declare @Items Table (ID int not null IDENTITY(1, 1),UserId int,TypeId int,ScoreId int)

您可能已经创建了上述表格，因此可以替换它们。因此，让我们跳转到将填充表格的t-sql

declare @RowsCount int=(select count(*) from @jsonData)
declare @index int=1
declare @jsonRowData NVARCHAR(MAX)
declare @username NVARCHAR(50)

while(@index<=@RowsCount)
begin
    ;with JsonDataWithSeq as (
        select ROW_NUMBER() over(order by jsonText) [seq],* from @jsonData
    ) select top(1) @jsonRowData=JsonDataWithSeq.jsonText from JsonDataWithSeq where seq=@index

    --select @jsonRowData [jsonRowData],ISJSON(@jsonRowData)  [ISJSON]

    SELECT @username=JSON_VALUE(@jsonRowData, '$.username') 

    if not exists (select * from @Users where username=@username)--no need for this check if names are unique in the list
    insert into @Users (username) values(@username)

    insert into @Types([type])
    SELECT xx.[type] from OPENJSON(@jsonRowData, 'lax $.items') with ([type] int) xx where xx.[type] not in (select t.[type] From @Types t)

    insert into @Scores([score])
    SELECT xx.[score] from OPENJSON(@jsonRowData, 'lax $.items') with ([score] nvarchar(50)) xx where xx.[score] not in (select t.[score] From @Scores t)

    insert into @Items(UserId,TypeId,ScoreId)
    SELECT u.ID [UserID],t.ID [TypeID],s.ID [ScoreID]  from OPENJSON(@jsonRowData, 'lax $.items') with ([type] int,[score] nvarchar(50)) xx
    inner join @Users u on u.username=@username
    inner join @Types t on t.[type]=xx.[type]
    inner join @Scores s on s.score=xx.score

    set @index=@index+1
end

select * from @Users
select * from @Types
select * from @Scores
select * from @Items

就这样，希望对您有所帮助。

对于我提供的数据，我得到了以下结果：-

用户表：-

ID  username
==  ========
1   George84
2   George85
3   George86
4   George87

类型表：-

得分表：-

ID  score
==  ======
1   90
2   84
3   90
4   q
5   80
6   70
7   60

项目表：-

ID      UserId  TypeId  ScoreId
==      ====== =======  ========
1       1       1       1
2       1       1       3
3       1       2       2
4       1       3       1
5       1       3       3
6       1       4       4
7       2       1       5
8       2       2       1
9       2       2       3
10      2       3       6
11      2       5       4
12      3       6       5
13      3       2       6
14      3       3       1
15      3       3       3
16      3       4       4
17      4       1       1
18      4       1       3
19      4       2       6
20      4       3       7
21      4       4       4

将JSON字符串转换为SQL Server中的规范化架构

1 个答案: