Question

我正在编写一个XQuery来分析大量存储类似于下面示例的查询的XML文件。对于这些查询，我想计算各种子元素的平均值，总和和其他信息。另外，我想在同一文档中生成查询的子部分，例如所有没有命中的查询。

由于我将操作数十万个XML文件，我想让我的xquery尽可能高效。我试图在文档中使用单个for迭代，但我根本无法弄清楚如何获得我需要的所有信息。

以下是XML示例：

<Query>
  <QueryString>Gigabyte Sapphire GTX-860</QueryString>
  <StatusCode>0</StatusCode>
  <QueryTime>0.04669069110297385</QueryTime>
  <Hits>8</Hits>
  <Date>2013-05-02</Date>
  <Time>12:07:07</Time>
  <LastModified>12:07:07</LastModified>
  <Pages resultsPerPage="10" clickCount="2">
    <Page resultCount="8" visited="true">
      <Result index="1" clickIndex="0" timeViewed="0" pid="85405" title="DDR3 1024 MB" />
      <Result index="2" clickIndex="1" timeViewed="178" pid="54065" title="ATK Excellium&#x9;" />
      <Result index="3" clickIndex="0" timeViewed="0" pid="74902" title="Intel E9650" />
      <Result index="4" clickIndex="0" timeViewed="0" pid="56468" title="ASUS Radeon HD 7980" />
      <Result index="5" clickIndex="0" timeViewed="0" pid="31072" title="Intel E7500" />
      <Result index="6" clickIndex="0" timeViewed="0" pid="26620" title="DDR3 2048 MB" />
      <Result index="7" clickIndex="2" timeViewed="92" pid="55625" title="Gigabyte Sapphire 7770" />
      <Result index="8" clickIndex="0" timeViewed="0" pid="67701" title="Intel E9650" />
    </Page>
  </Pages>
</Query>

这是XQuery：

let $doc := collection('file:///C:/REP/XML/input?select=*.xml')
for $y in (
    <Queries>
    {
        for $x in $doc
        let $hits := $x/Query/Hits
        return <Query hits="{$hits}" >{$x/Query/QueryString/string()}</Query>
    }
    </Queries>
)
let $avgHits := avg(data($y/Query/@hits))
let $numQueries := count($y/*)
return <Statistics avgHits="{$avgHits}" numQueries="{$numQueries}"/>

为10个XML文件的样本正确返回<Statistics numQueries="10" avgHits="19.7"/>。这是正确的方法吗？我似乎需要双重，因此我可以将查询从不相交的文件组合在一起，因为我似乎无法在它们上运行函数。

我还需要在创建的<Statistics>元素中重复一些查询。我需要重复FLWOR声明吗？我不能在计算它们的for语句之外加上求和或平均值但我无法计算它们和执行一个子选择，因为我必须包含一个过滤它们的位置。

（更新）这是我提出的用于包含查询子部分的查询，但正如我所提到的，我担心性能。

let $doc := collection('file:///C:/REP/XML/input?select=*.xml')
for $y in (
    <Queries>
    {
        for $x in $doc
        let $hits := $x/Query/Hits
        return <Query hits="{$hits}" >{$x/Query/QueryString/string()}</Query>
    }
    </Queries>
)
let $avgHits := avg(data($y/Query/@hits))
let $numQueries := count($y/*)
return <Statistics avgHits="{$avgHits}" numQueries="{$numQueries}">
    {
    for $x in $doc
    let $hits := $x/Query/Hits
    where $x/Query/Hits < 10
    return <Query hits="{$hits}" >{$x/Query/QueryString/string()}</Query>
    }   
</Statistics>

XQuery处理器是否会优化我的for语句，还是会访问所有XML文件，并且每个文件都包含这些文件？第一个let语句会阻止这个吗？

这是我打算产生的那种文件：

<DailyStats date="2013-04-15" >
    <DayStats>
        <QueryCount>24644</QueryCount>
        <Errors>0</Errors>
        <EmptySearches>643</EmptySearches>
        <AverageSearchTime>0.0213</AverageSearchTime>
        <AverageSearchesPerHour>236</AverageSearchesPerHour>
    </DayStats>
    <StoredQueries>
        <FailedSearches>
            <FailedSearch time="23:33:34" query="blurey" searchTime="0.0524" />
        </FailedSearches>
    </StoredQueries>
</DailyStats>

Answer 1

如果您担心性能，则应使用XML数据库（如果尚未这样做），因为它会通过索引数据来提高性能。另外，例如使用BaseX并将XML文件加载到数据库中，您可以使用```db：open（“your-db”）````来避免嵌套的for循环来访问所有节点。此外，您可以使用一些特定于数据库的索引，这将加快您的查询速度。如果你有一个简单的XQuery前驱工作fs，它肯定会触及每个xml文件，因为它对每个文件中的数据一无所知。

除此之外，你的XQuery对我来说基本上很好。正如我试图指出的那样，优化在很大程度上取决于您使用的处理器/数据库。

是的，你必须进行一些测试，几乎不可能对实时运行时说什么，因为它在很大程度上取决于你拥有的数据和查询。但是，以后切换到数据库应该不会太难，所以我不会太担心它。

单个FLWOR中的函数和子选择

1 个答案: