我正在处理Power Bi审核日志报告文件。该文件包含一列“ AuditDate”,并且其中包含多个列。我需要使用sql将该列拆分为多个列。
该列具有这样的值
AuditDate
------------
"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""john@abc.com"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"
基本上我需要将此列拆分为
id RecordType CreationTime Operaration OrganizationID UserType
------------------------------------------------------------------------------
44de2468 20 2018-08-03T12:30:34 ViewReport 779558 0
任何人都可以帮助sql查询吗?
答案 0 :(得分:2)
这非常简单,您只需要一个字符串“ splitter”(AKA标记器)即可。如果您使用的是SQL 2016+,则可以使用STRING_SPLIT
;如果您使用的是2016年之前的系统,则可以在2005+上使用DelimitedSplit8K或在2012+上使用DelimitedSplit8K_LEAD。解决方案如下所示:
DECLARE @AuditDate VARCHAR(8000) =
'"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""john@abc.com"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"'
SELECT
Id = MAX(CASE split.attrib WHEN 'ID' THEN split.val END),
RecordType = MAX(CASE split.attrib WHEN 'RecordType' THEN split.val END),
CreationTime = MAX(CASE split.attrib WHEN 'CreationTime' THEN split.val END),
Operation = MAX(CASE split.attrib WHEN 'Operation' THEN split.val END),
OrganizationId = MAX(CASE split.attrib WHEN 'OrganizationId' THEN split.val END),
UserType = MAX(CASE split.attrib WHEN 'UserType' THEN split.val END)
FROM
(
SELECT attrib = REPLACE(REPLACE(SUBSTRING(split.value, 1, mid.point-1),'{',''),'"',''),
val = REPLACE(REPLACE(SUBSTRING(split.value, mid.point+1, 8000),'{',''),'"','')
FROM STRING_SPLIT(@AuditDate,',') AS split
CROSS APPLY (VALUES(CHARINDEX(':', split.value))) AS mid(point)
WHERE REPLACE(REPLACE(SUBSTRING(split.value, 1, mid.point-1),'{',''),'"','') IN
('id','RecordType','CreationTime','Operation','OrganizationID','UserType')
) AS split;
结果:
Id RecordType CreationTime Operation OrganizationId UserType
---------- ----------- --------------------- ----------- --------------- ---------
44de2468 20 2018-08-03T12:30:34 ViewReport 779558 0
答案 1 :(得分:1)
您似乎在这里处理格式错误的JSON列。那些双双引号很麻烦。
但是,如果您可以清理格式,则只能在查询中使用JSON函数。
首先,设置数据(使用您在此问题的其他副本中提供的数据,{Split column values into multiple columns):
DECLARE @t TABLE
(
RecordType NVARCHAR(20)
,AuditDate NVARCHAR(MAX)
);
INSERT @t
(
RecordType
,AuditDate
)
VALUES
('View', '{""Id"":""44de2468"",""Type"":20,""CreationDate"":""2018-08-23""}')
,('Edit', '{""Id"":""44de2467"",""Type"":40,""CreationDate"":""2018-08-24""}')
,('Print', '{""Id"":""44de2768"",""Type"":60,""CreationDate"":""2018-05-06""}')
,('Delete', '{""Id"":""44de2488"",""Type"":30,""CreationDate"":""2018-07-20""}');
现在,通过将双双引号替换为单双引号来清理格式错误的JSON。
UPDATE @t
SET AuditDate = REPLACE(AuditDate,'""','"');
验证JSON的外观。
SELECT * FROM @t
--Results:
+------------+---------------------------------------------------------+
| RecordType | AuditDate |
+------------+---------------------------------------------------------+
| View | {"Id":"44de2468","Type":20,"CreationDate":"2018-08-23"} |
| Edit | {"Id":"44de2467","Type":40,"CreationDate":"2018-08-24"} |
| Print | {"Id":"44de2768","Type":60,"CreationDate":"2018-05-06"} |
| Delete | {"Id":"44de2488","Type":30,"CreationDate":"2018-07-20"} |
+------------+---------------------------------------------------------+
然后使用JSON_VALUE()
提取您感兴趣的部分。
SELECT
RecordType
, JSON_VALUE(AuditDate, '$.Id') AS [Id]
, JSON_VALUE(AuditDate, '$.Type') AS [Type]
, JSON_VALUE(AuditDate, '$.CreationDate') AS CreationDate
FROM @t
--Results
+------------+----------+------+--------------+
| RecordType | Id | Type | CreationDate |
+------------+----------+------+--------------+
| View | 44de2468 | 20 | 2018-08-23 |
| Edit | 44de2467 | 40 | 2018-08-24 |
| Print | 44de2768 | 60 | 2018-05-06 |
| Delete | 44de2488 | 30 | 2018-07-20 |
+------------+----------+------+--------------+
答案 2 :(得分:1)
对于SQL Server 2016,这非常简单。有相当多的JSON支持。您唯一的问题是,您的字符串不正确。很显然,有一个引擎使所有内部引号加倍(一种转义技术)。
如果这在您的控制之下,则应尝试将列的格式更改为正确的JSON。最好让编写应用程序以正确的JSON格式提供这些审核。至少您可以添加第二列并使用触发器来保持同步。不得已时,您可以使用REPLACE
来修复您的字符串:
REPLACE(REPLACE(REPLACE(@YourString,'"{','{'),'}"','}'),'""','"');
由于行很多,可能要花一些时间...这就是为什么最好将格式保留为正确的JSON。
仅展示原理:
DECLARE @YourString NVARCHAR(MAX)=N'"{""Id"":""44de2468"",""RecordType"":20,""CreationTime"":""2018-08-03T12:30:34"",""Operation"":""ViewReport"",""OrganizationId"":""779558"",""UserType"":0,""UserKey"":""FFFA3DA"",""Workload"":""PowerBI"",""UserId"":""john@abc.com"",""ClientIP"":""9.5.3.26"",""UserAgent"":""Mozilla\/5.0 (Windows NT 10.0;"",""Activity"":""ViewReport"",""ItemName"":""Sales"",""WorkSpaceName"":""TeamITO"",""DatasetName"":""Sales1"",""ReportName"":""Sales1"",""WorkspaceId"":""e8eaa0ca"",""ObjectId"":""Sales1"",""DatasetId"":""4c5d-ad45-eb6546"",""ReportId"":""4cb0-99ad-de41b5160c47"",""IsSuccess"":true,""DatapoolRefreshScheduleType"":""None"",""DatapoolType"":""Undefined""}"';
SET @YourString = REPLACE(REPLACE(REPLACE(@YourString,'"{','{'),'}"','}'),'""','"');
您的字符串现在将如下所示:
{"Id":"44de2468","RecordType":20,"CreationTime":"2018-08-03T12:30:34","Operation":"ViewReport","OrganizationId":"779558","UserType":0,"UserKey":"FFFA3DA","Workload":"PowerBI","UserId":"john@abc.com","ClientIP":"9.5.3.26","UserAgent":"Mozilla\/5.0 (Windows NT 10.0;","Activity":"ViewReport","ItemName":"Sales","WorkSpaceName":"TeamITO","DatasetName":"Sales1","ReportName":"Sales1","WorkspaceId":"e8eaa0ca","ObjectId":"Sales1","DatasetId":"4c5d-ad45-eb6546","ReportId":"4cb0-99ad-de41b5160c47","IsSuccess":true,"DatapoolRefreshScheduleType":"None","DatapoolType":"Undefined"}
此查询将以驱动列表的形式返回所有列:
SELECT *
FROM OPENJSON(@YourString);
结果返回一个带有类型提示的列表(而“值”的实际类型为nvarchar
):
+-----------------------------+-------------------------------+------+
| key | value | type |
+-----------------------------+-------------------------------+------+
| Id | 44de2468 | 1 |
+-----------------------------+-------------------------------+------+
| RecordType | 20 | 2 |
+-----------------------------+-------------------------------+------+
| CreationTime | 2018-08-03T12:30:34 | 1 |
+-----------------------------+-------------------------------+------+
| Operation | ViewReport | 1 |
+-----------------------------+-------------------------------+------+
| OrganizationId | 779558 | 1 |
+-----------------------------+-------------------------------+------+
| UserType | 0 | 2 |
+-----------------------------+-------------------------------+------+
| UserKey | FFFA3DA | 1 |
+-----------------------------+-------------------------------+------+
| Workload | PowerBI | 1 |
+-----------------------------+-------------------------------+------+
| UserId | john@abc.com | 1 |
+-----------------------------+-------------------------------+------+
| ClientIP | 9.5.3.26 | 1 |
+-----------------------------+-------------------------------+------+
| UserAgent | Mozilla/5.0 (Windows NT 10.0; | 1 |
+-----------------------------+-------------------------------+------+
| Activity | ViewReport | 1 |
+-----------------------------+-------------------------------+------+
| ItemName | Sales | 1 |
+-----------------------------+-------------------------------+------+
| WorkSpaceName | TeamITO | 1 |
+-----------------------------+-------------------------------+------+
| DatasetName | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| ReportName | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| WorkspaceId | e8eaa0ca | 1 |
+-----------------------------+-------------------------------+------+
| ObjectId | Sales1 | 1 |
+-----------------------------+-------------------------------+------+
| DatasetId | 4c5d-ad45-eb6546 | 1 |
+-----------------------------+-------------------------------+------+
| ReportId | 4cb0-99ad-de41b5160c47 | 1 |
+-----------------------------+-------------------------------+------+
| IsSuccess | true | 3 |
+-----------------------------+-------------------------------+------+
| DatapoolRefreshScheduleType | None | 1 |
+-----------------------------+-------------------------------+------+
| DatapoolType | Undefined | 1 |
+-----------------------------+-------------------------------+------+
更好的是,您可以像下面这样添加WITH
子句:
SELECT *
FROM OPENJSON(@YourString)
WITH
(
Id varchar(200) '$.Id',
RecordType int '$.RecordType',
CreationTime datetime '$.CreationTime'
--Add all your known columns here...
)
这样做可以让您键入值并并排
+----------+------------+-------------------------+
| Id | RecordType | CreationTime |
+----------+------------+-------------------------+
| 44de2468 | 20 | 2018-08-03 12:30:34.000 |
+----------+------------+-------------------------+