考虑以下情况。我有下表
k = 0
for i in range(len(columns)):
for j in range(i + 1, len(columns)):
if columns[i].lower() == columns[j].lower():
k = k+1
df = (df.withColumn(columns[i].upper()+str(k),concat(col(columns[i]),lit(","), col(columns[j]))))
GoldenEgg样本数据:
SubscriptionID 6070的 CREATE TABLE [dbo].[GoldenEgg]
(
rowIndex int NOT NULL IDENTITY(1,1),
AccountNumber varchar(256) NULL,
SubscriptionID int NOT NULL,
SubscriptionData_XML xml NULL,
SubscriptionData_AFTER_XML NULL
CONSTRAINT [PK_GoldenEgg]
PRIMARY KEY CLUSTERED ([rowIndex] ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
数据:
SubscriptionData_XML
我想将每个SubscriptionID的所有帐号附加到SubscriptionData_XML列中已存在的xml <NVPList xmlns="http://www.whatevernamspace.com/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Item>
<Name>AccountNumbers</Name>
<Value>
<ValueItem>39448474</ValueItem>
</Value>
</Item>
</NVPList>
节点,我不想添加xml中已存在的帐号。
因此,对于SubscriptionID 6070,帐号39448474应仅在xml中列出一次,如下所示:
<Value>
答案 0 :(得分:2)
如果您的XML中没有其他节点,您可以选择FLWOR-query。
一些提示:
FOR XML
- 子选择没有命名空间来构建<Value>
节点,而不用担心实际XML中已有的ID FLWOR-query()
从刚创建的Value-node UPDATE
SELECT * FROM @tbl
向您显示所有AFTER_XML
已填写试试这个:
DECLARE @tbl TABLE(rowIndex INT IDENTITY,AccountNumber INT,SubscriptionID INT, SubscriptionData_XML XML,SubscriptionData_AFTER_XML XML);
INSERT INTO @tbl VALUES
(1111,6070,N'<NVPList xmlns="http://www.whatevernamspace.com/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Item>
<Name>AccountNumbers</Name>
<Value>
<ValueItem>39448474</ValueItem>
</Value>
</Item>
</NVPList>',NULL)
,(2222,6070,NULL,NULL)
,(3333,6070,NULL,NULL)
,(4444,6070,NULL,NULL)
,(5555,6071,N'<NVPList xmlns="http://www.whatevernamspace.com/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Item>
<Name>AccountNumbers</Name>
<Value>
<ValueItem>39448474</ValueItem>
</Value>
</Item>
</NVPList>',NULL)
,(6666,6071,NULL,NULL)
,(7777,6071,NULL,NULL)
,(8888,6071,NULL,NULL);
- 这里启动可更新的CTE
WITH UpdateableCTE AS
(
SELECT t1.rowIndex
,t1.SubscriptionData_AFTER_XML
,(
SELECT t2.AccountNumber AS ValueItem
FROM @tbl AS t2
WHERE t2.SubscriptionID=t1.SubscriptionID
FOR XML PATH(''),ROOT('Value'),TYPE
).query
(N'declare default element namespace "http://www.whatevernamspace.com/v1";
let $nd:=/*:Value
return
<NVPList>
<Item>
<Name>{sql:column("XmlName")}</Name>
<Value>
{
for $vi in $nd/*:ValueItem
return <ValueItem>{$vi/text()}</ValueItem>
}
</Value>
</Item>
</NVPList>
'
) AS NewXML
FROM @tbl AS t1
CROSS APPLY( SELECT t1.SubscriptionData_XML.value('(//*:Name)[1]','nvarchar(max)') AS XmlName) AS x
WHERE SubscriptionData_XML IS NOT NULL
)
- UPDATE语句
UPDATE UpdateableCTE SET SubscriptionData_AFTER_XML=NewXML
FROM UpdateableCTE;
- 用于检查成功的SELECT
SELECT * FROM @tbl
答案 1 :(得分:1)
我能够使用xml UPDATE
方法使用sql modify()
语句完成此任务,而不使用任何循环。以下是解决方案的细分:
1)我必须获得SubscriptionID的所有AccountNumbers并将其格式化
进入xml <ValueItem>
个节点。
SQL QUERY 1:
SELECT
ge.SubscriptionID,
CAST((SELECT DISTINCT ValueItem = ISNULL(ge2.AccountNumber,'')
FROM dbo.GoldenEgg ge2
WHERE ge2.SubscriptionID = ge.SubscriptionID
FOR XML PATH('')) AS xml) AS AccountNumberXml
FROM dbo.GoldenEgg ge
WHERE ge.SubscriptionData_XML IS NOT NULL
SQL QUERY 1 XML RESULT (SubscriptionID 6070):
<ValueItem>39448474</ValueItem>
<ValueItem>41447395</ValueItem>
<ValueItem>56936495</ValueItem>
<ValueItem>70660044</ValueItem>
2)现在我将AccountNumbers放在一个值中,现在我可以使用xml modify()
方法并将AccountNumberXml
值插入<Value>
xml节点的最后位置。我将使用带有UPDATE
的{{1}}语句执行此操作。另请注意,在执行任何操作之前,我最初将SubscriptionData_AFTER_XML设置为等于SubscriptionData_XML。
SQL QUERY 2:
INNER JOIN
SQL QUERY 2 XML RESULT (SubscriptionID 6070 SubscriptionData_AFTER_XML列):
UPDATE ge
SET SubscriptionData_AFTER_XML.modify
('declare default element namespace "http://www.whatevernamspace.com/v1";
insert sql:column("t1.AccountNumberXml") as last into (/NVPList/Item/Value)[1]')
FROM dbo.GoldenEgg ge
INNER JOIN (SELECT
ge2.SubscriptionID,
CAST((SELECT DISTINCT ValueItem = ISNULL(ge1.AccountNumber,'')
FROM dbo.GoldenEgg ge1
WHERE ge1.SubscriptionID = ge2.SubscriptionID
FOR XML PATH('')) AS xml) as AccountNumberXml
FROM dbo.GoldenEgg ge2
WHERE ge2.SubscriptionData_AFTER_XML IS NOT NULL) t1 ON t1.SubscriptionID = ge.SubscriptionID
WHERE ge.SubscriptionData_AFTER_XML IS NOT NULL
正如您所看到的,SubscriptionData_AFTER_XML列中的最终xml结果现在存在两个问题。
对于subscriptionID 6070,帐户号码39448474正在<NVPList xmlns="http://www.whatevernamspace.com/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Item>
<Name>AccountNumbers</Name>
<Value>
<ValueItem>39448474</ValueItem>
<ValueItem xmlns="">39448474</ValueItem>
<ValueItem xmlns="">41447395</ValueItem>
<ValueItem xmlns="">56936495</ValueItem>
<ValueItem xmlns="">70660044</ValueItem>
</Value>
</Item>
</NVPList>
节点列表中重复,这是我不想要的。要解决此问题,我必须查询xml中的当前AccountNumber值,并从之前的<ValueItem>
SQL QUERY 3:
此查询将为我提供包含SubscriptionData_XML列中所有当前AccountNumbers的结果集,然后我可以使用该结果集从 SQL QUERY 1 结果集中排除这些AccountNumbers
INNER JOIN
现在将它们放在一起以获得正确的最终结果
SQL QUERY 4:
SELECT SubscriptionID, t.c.value('.', 'varchar(MAX)') as CurrentValueItems
FROM dbo.GoldenEgg
CROSS APPLY SubscriptionData_XML.nodes('declare default element namespace "http://www.whatevernamspace.com/v1";
/NVPList/Item/Value/ValueItem') as t(c)
WHERE SubscriptionData_XML IS NOT NULL
SQL QUERY 4 XML RESULT (SubscriptionID 6070 SubscriptionData_AFTER_XML列):
正如您所见,AccountNumber 39448474现在仅在xml中列出一次
UPDATE ge
SET SubscriptionData_AFTER_XML.modify
('declare default element namespace "http://www.whatevernamspace.com/v1";
insert sql:column("t1.AccountNumberXml") as last into (/NVPList/Item/Value)[1]')
FROM dbo.GoldenEgg ge
INNER JOIN (SELECT
ge2.SubscriptionID,
CAST((SELECT DISTINCT ValueItem = ISNULL(ge1.AccountNumber,'')
FROM dbo.GoldenEgg ge1
--make sure we are not inserting AccountNumbers that already exists in the subscription data
WHERE ge1.AccountNumber NOT IN (SELECT t.c.value('.', 'varchar(MAX)') as CurrentValueItems
FROM dbo.GoldenEgg
CROSS APPLY SubscriptionData_XML.nodes('declare default element namespace "http://www.whatevernamspace.com/v1";
/NVPList/Item/Value/ValueItem') as t(c)
WHERE SubscriptionData_XML IS NOT NULL
AND SubscriptionID = ge2.SubscriptionID)
AND ge1.SubscriptionID = ge2.SubscriptionID
FOR XML PATH('')) AS xml) as AccountNumberXml
FROM dbo.GoldenEgg ge2
WHERE ge2.SubscriptionData_AFTER_XML IS NOT NULL) t1 ON t1.SubscriptionID = ge.SubscriptionID
WHERE ge.SubscriptionData_AFTER_XML IS NOT NULL
当插入带有AccountNumber节点列表时,它将插入一个空的<NVPList xmlns="http://www.whatevernamspace.com/v1" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
<Item>
<Name>AccountNumbers</Name>
<Value>
<ValueItem>39448474</ValueItem>
<ValueItem xmlns="">41447395</ValueItem>
<ValueItem xmlns="">56936495</ValueItem>
<ValueItem xmlns="">70660044</ValueItem>
</Value>
</Item>
</NVPList>
命名空间。这是我用来删除空xmlns=""
命名空间的查询。
SQL QUERY 5:
xmlns=""
SQL QUERY 5 XML RESULT (SubscriptionID 6070):
UPDATE dbo.GoldenEgg
SET SubscriptionData_AFTER_XML = CONVERT(XML, REPLACE(CONVERT(NVARCHAR(MAX), SubscriptionData_AFTER_XML), N'xmlns=""',''))
WHERE SubscriptionData_AFTER_XML IS NOT NULL
我希望这可以帮助任何可能需要做类似事情的人