WebHarvest在一个请求中需要50个结果

时间:2014-09-08 15:55:40

标签: xml http search request webharvest

我是这种语言的新手,我陷入了一个简单的任务。

基本上我想得到50个结果,而不是搜索者给我的基本结果的基本10个结果。这将是代码:

<include path="functions.xml"/>
<!-- The phrase to search for in the 4shared API -->
<var-def name="searchPhrase">Adele Rolling In The Deep</var-def>

<!-- The type of sorting that the 4shared API will use (2 = sort by upload date) -->
<var-def name="sortType">2</var-def>

<!-- The order for search results (-1 = ascending) -->
<var-def name="sortOrder">-1</var-def>

<!-- The page to start the search on (the API returns 10 results per page) -->
<var-def name="start">1</var-def>

<!-- Make a web request to the 4shared search API and store the XML returned in the 'pageXml' variable -->
<while condition="true" index="start" maxloops="5">
    <var-def name="pageXml" overwrite="true">

            <http url="http://search.4shared.com/network/searchXml.jsp" charset="UTF8">
            <http-param name="q"><template>${searchPhrase}</template></http-param>
            <http-param name="sortType"><template>${sortType}</template></http-param>
            <http-param name="sortOrder"><template>${sortOrder}</template></http-param>
            <http-param name="start"><template>${start}</template></http-param>
    </http> 

    </var-def>
</while>
<file action="write" type="text" path="xml/pageXml.xml">
    <var name="pageXml"></var>
</file>


<!-- Extract all 'url' nodes from the XML returned from the API call using XPath.
     The xpath expression used here is '//result-files/file/url/text()' meaning return the text value of all
     'url' nodes with a 'file' node, within a 'result-files' node.
--> 
<var-def name="urls" overwrite="true">
    <xpath expression="//result-files/file/url/text()">
        <var name="pageXml"/>
    </xpath>
</var-def>

<!-- Extract all 'name' nodes from the XML returned from the API call using XPath --> 


<!-- Extract all 'downloads-count' nodes from the XML returned from the API call using XPath --> 
<var-def name="downloadsCount">
    <xpath expression="//result-files/file/downloads-count/text()">
        <var name="pageXml"/>
    </xpath>
</var-def>
<file action="write" type="text" path="xml/downloadsAverage.xml">
            <var name="downloadsCount"></var>
</file>
<!-- Extract all 'size' nodes from the XML returned from the API call using XPath --> 
<var-def name="size">
    <xpath expression="//result-files/file/size/text()">
        <var name="pageXml"/>
    </xpath>
</var-def>
<file action="write" type="text" path="xml/size.xml">
            <var name="size"></var>
</file>




<!-- Calculate the average number of downloads.
     Note that the <script> tag is used in order to interact with the variables using beanscript,
     which is a Java like language.
     Previously defined variables can be accessed directly, and a toList() method can be called to convert them into
     a List, and then individual items can be accessed by called the standard List method of get(), and then finally
     toString() can be called to convert individual items to Strings.
-->
<var-def name="averageDownloads">
    <script return="returnVariable"><![CDATA[
        var downloadsCountSize = downloadsCount.toList().size();
        var totalDownloads = 0;

        for(int i = 0; i < downloadsCountSize; i++) {
            totalDownloads = totalDownloads + new Integer(downloadsCount.toList().get(i).toString().replace(",",""));
        }

        var returnVariable = totalDownloads / downloadsCountSize;
    ]]></script>
</var-def>
<file action="write" type="text" path="xml/avgDownloads.xml">
            <var name="averageDownloads"></var>
</file>

任何人都可以告诉我该如何解决这个问题!??? PLEASE !!

1 个答案:

答案 0 :(得分:0)

你太早关闭了你的while循环(</while>)。

只会重复<while>...</while>中的语句,这意味着您只处理结果的最后一页(写pageXml.xml,提取&#39; url&#39;节点等...)< / p>

因此,我认为您的</while>应该出现在您在此处发布的代码的末尾。