我有以下数据:
<!-- subjects.xml -->
<Subjects>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
<Subject>
<Id>3</Id>
<Name>History</Name>
</Subject>
</Subjects>
将加入:
<!-- courses.xml-->
<Courses>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Courses>
我希望在第一个表上对第二个表进行左连接,以获得以下输出:
<Results>
<Result>
<Table1>
<Subject>
<Id>1</Id>
<Name>Maths</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra I</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Algebra II</Name>
</Course>
<Course>
<SubjectId>1</SubjectId>
<Name>Percentages</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<!-- Notice there are 2 subjects here, as they both have the same ID-->
<Subject>
<Id>2</Id>
<Name>Science</Name>
</Subject>
<Subject>
<Id>2</Id>
<Name>Advanced Science</Name>
</Subject>
</Table1>
<Table2>
<Course>
<SubjectId>2</SubjectId>
<Name>Physics</Name>
</Course>
<Course>
<SubjectId>2</SubjectId>
<Name>Biology</Name>
</Course>
</Table2>
</Result>
<Result>
<Table1>
<Subject>
<Id>3</Id>
<Name>History</Name>
</Subject>
</Table1>
<Table2>
<!-- Notice this section is empty -->
</Table2>
</Result>
</Results>
我有以下代码来执行此操作:
<Results>
{
(: For each element in courses, where it's 'SubjectId' exists in "subjects.xml":)
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
where $e1
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2>
{$e2}
</Table2>
</Result>
}
{
(: PART2 :)
(:Show the remaining elements in courses that have not yet been outputted:)
for $e1 in doc('subjects.xml')/Subjects/Subject
let $idVal := $e1/Id
group by $idVal
where not(doc('courses.xml')/Courses/Course/SubjectId = $idVal)
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2/>
</Result>
}
</Results>
注意代码工作正常并完成工作。但是,我发现当执行大输入的代码(750个主题,每个有120个课程以及100个没有任何课程的主题和100个没有任何主题的课程)时,脚本运行速度极慢!
如何才能让我的脚本更快?有没有更好的方法呢?什么是时间复杂度?
更新2
事实证明我严重错误地识别了这个问题。问题实际上与代码的第2部分关系不大,而是代码的第1部分。
我做的是:
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
group by $foriegnId
当我应该做的是:
for $e2 in doc("courses.xml")/Courses/Course
let $foriegnId := $e2/SubjectId
group by $foriegnId
let $e1 := doc("subjects.xml")/Subjects/Subject[Id = $foriegnId]
这将代码的时间从30,000ms减少到大约4,000ms。
欢迎进一步提升性能。
答案 0 :(得分:1)
根据查询的优化方式,ID列表可能会一次又一次地放在一起,每个主题一次。提前一次获取列表,然后对此进行验证。
let $subjectIds := doc('courses.xml')/Courses/Course/SubjectId
for $e1 in doc('subjects.xml')/Subjects/Subject
let $idVal := $e1/Id
group by $idVal
where not($subjectIds = $idVal)
return
<Result>
<Table1>
{$e1}
</Table1>
<Table2/>
</Result>
进一步优化可能是在以下情况下将部分冗余主题ID的列表修剪为其不同值的序列:
let $subjectIds := distinct-values(doc('courses.xml')/Courses/Course/SubjectId)