Using IBM DB2 database, I have a three relational tables:
Project: id, title, description
Topic: projectId, value
Tag: projectId, value
I need to produce the following XML file from the previous table:
<projects>
<project id="project1">
<title>title1</title>
<description>desc1</description>
<topic>topic1</topic>
<topic>topic2</topic>
<tag>tag1</tag>
<tag>tag2</tag>
<tag>tag3</tag>
</project>
...
</projects>
I've tried the following query, and it works:
XQUERY
let $projects := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "project", XMLATTRIBUTES(id, title, description)) AS project FROM mydb.Project')
let $TopicSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Topic')
let $TagSet := db2-fn:sqlquery('SELECT XMLELEMENT(NAME "row", XMLATTRIBUTES(projectId, value)) FROM mydb.Tag')
for $project in $projects return
<project>
{$project/@ID}
<title>{$project/fn:string(@TITLE)}</title>
<description>{$project/fn:string(@DESCRIPTION)}</description>
{for $row in $TopicSet[@PROJECTID=$project/@ID] return <Topic>{$row/fn:string(@VALUE)}</Topic>}
{for $row in $TagSet[@PROJECTID=$project/@ID] return <Tag>{$row/fn:string(@VALUE)}</Tag>}
</project>
;
However, it took 9 hours to complete (there 200k projects in the table)
How can I improve that?
Do I really need to create the three intermediate db2-fn:sqlquery to achieve this? is there another way?
Would it be faster if I create these 3 three intermediate db2-fn:sqlquery and put them in a table (with only one row and one attribute), and then index this before querying the "for $project in $projects return" part?
Or, how would you proceed to achieve my goal?
Best regards,
David
---
As proposed by Peter Schuetze, I tried the XMLAGG as follows:
SELECT
XMLSERIALIZE(
XMLDOCUMENT(
XMLELEMENT(
NAME "Project",
XMLATTRIBUTES(P.project),
XMLAGG(XMLELEMENT(NAME "Topic", Topic.value)),
XMLAGG(XMLELEMENT(NAME "Tag", Tag.value)),
)
) AS CLOB(1M)
)
FROM mydb.project P
LEFT JOIN mydb.Topic Topic ON (P.project = Topic.project)
LEFT JOIN mydb.Tag Tag ON (P.project = Tag.project)
GROUP BY P.project;
This works indeed much much faster!
However, if a project has not any topic, it will still display topic element, with a blank text, such as:
<projects>
<project id="project1">
<title>title1</title>
<description>desc1</description>
<topic></topic>
<tag>tag1</tag>
<tag>tag2</tag>
<tag>tag3</tag>
</project>
...
</projects>
How to remove this "<topic></topic>"?
答案 0 :(得分:1)
如果列可能为NULL并且在发生这种情况时您不想要空元素标记,请使用XMLFOREST而不是XMLELEMENT。因此,对于主题,您将用
替换其XMLELEMENT函数XMLFOREST( Topic.value AS "topic" )
在同一SELECT语句中包含两个XMLAGG函数的方式存在问题。如果您的语句中只有一个XMLAGG,则没有问题,因为父键上的GROUP BY将整齐地折叠XMLAGG中指定的子条目。但是,当您在同一SELECT中指定多个XMLAGG函数时,查询会在内部生成笛卡尔积,因此在这种情况下,您将看到XMLAGG返回的每个组内的重复项。您给出的项目只有零个或一个主题的例子没有证明这个问题,但如果一个项目有两个主题和三个标签,你会看到每个主题重复三次,每个标签重复两次。为了防止这种情况,您需要将每个XMLAGG重定位到子查询或生成单个XML片段的公用表表达式,以便您可以安全地从主查询中引用它。
下面是将XMLAGG推送到公用表表达式的示例。它还消除了对XMLFOREST的需求,因为XMLAGG不会为空输入集产生任何结果。
WITH topicxml( projectid, xmlfragment ) AS ( SELECT topic.projectid, XMLAGG( XMLELEMENT( NAME "topic", topic.value ) ORDER BY topic.value) FROM mydb.topic topic GROUP BY topic.projectid ), tagxml ( projectid, xmlfragment ) AS ( SELECT projectid, XMLAGG( XMLELEMENT( NAME "tag", tag.value ) ORDER BY tag.value) FROM mydb.tag tag GROUP BY tag.projectid ) SELECT XMLSERIALIZE ( CONTENT XMLELEMENT( NAME "project", XMLATTRIBUTES( p.id AS "id" ), XMLELEMENT( NAME "title", p.title ), XMLELEMENT( NAME "description", p.description ), XMLCONCAT( topicxml.xmlfragment, tagxml.xmlfragment ) ) AS VARCHAR(2000) ) FROM mydb.project p LEFT OUTER JOIN topicxml ON topicxml.projectid = p.id LEFT OUTER JOIN tagxml ON tagxml.projectid = p.id ;
答案 1 :(得分:0)
查看XMLAGG函数。这应该是您的需要。我还没有尝试过,但链接页面上的示例几乎就是您想要做的。