有没有办法从Web Harvest的子链接中收集数据?
下面是我使用的xml片段:
<loop item="item" index="i">
<list><var name="products"/></list>
<body>
<xquery>
<xq-param name="item"><var name="item"/></xq-param>
<xq-expression><![CDATA[
declare variable $item as node() external;
for $i in $item//div[1]/p/a[@trace='auction'][1]
let $url := data($i/@href)
如何根据这个现在为$ url的新网址获取数据?
请帮帮我。 THX。
答案 0 :(得分:0)
您只需要创建另一个来包含此信息。我已经创建了一个样本供您轻松理解。请看一下:
<强> SCRIPT:强>
<?xml version="1.0" encoding="UTF-8"?>
<config>
<var-def name="MainSite">http://www.appszoom.com/android_games/arcade_and_action</var-def>
<loop item="titles" index="i">
<list>
<xpath expression="//li[@class='app captureLinkBox']/div/div/span/a">
<html-to-xml>
<http url="${MainSite}"></http>
</html-to-xml>
</xpath>
</list>
<body>
<var-def name="titleURL">
<xpath expression="data(/a/@href)">
<var name="titles"/>
</xpath>
</var-def>
<file action="append" path="D:\navin.xml">
<xquery>
<xq-param name="titles"><template>${titles}</template></xq-param>
<xq-param name="titleURLContent">
<html-to-xml>
<http url="${titleURL}"></http>
</html-to-xml>
</xq-param>
<xq-expression>
<![CDATA[
declare variable $titles as node() external;
declare variable $titleURLContent as node() external;
<game>
<title>{$titles/a/text()}</title>
<downloads>{$titleURLContent//*[@id="left-bar"]/p[2]/span/text()}</downloads>
</game>
]]>
</xq-expression>
</xquery>
</file>
</body>
</loop>
</config>
<强>输出:强>
<game>
<title>Clash of Clans</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>DEER HUNTER 2014</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>Subway Surfers</title>
<downloads>100,000,000 - 500,000,000</downloads>
</game>
<game>
<title>RoboCop™</title>
<downloads>5,000,000 - 10,000,000</downloads>
</game><game>
<title>DragonFlight for Kakao</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>Castle Clash</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>Sonic Dash</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>Injustice: Gods Among Us</title>
<downloads>1,000,000 - 5,000,000</downloads>
</game>
<game>
<title>Banana Kong</title>
<downloads>10,000,000 - 50,000,000</downloads>
</game>
<game>
<title>Temple Run 2</title>
<downloads>100,000,000 - 500,000,000</downloads>
</game>
答案 1 :(得分:0)
你没有提供完整的代码,让我检查一下跑步,但是这应该可以帮助你:
<config>
<loop item="item" index="i">
<list><var name="products"/></list>
<body>
<var-def name="new_url">
<xquery>
<xq-param name="item"><var name="item"/></xq-param>
<xq-expression><![CDATA[
declare variable $item as node() external;
for $i in $item//div[1]/p/a[@trace='auction'][1]
let $url := data($i/@href)
return
{$url}
]]></xq-expression>
</xquery>
</var-def>
<!-- now your new url is saved in webharvest variable new_url and you are free to run a
new webharvest http request using it -->
<var-def name="new_page_content">
<http url="${new_url}"/>
</var-def>
<!-- now the content of the new page has been downloaded and saved in new variable
new_page_content and you are free to query it further should you want to -->
<var-def name="contact">
<xpath expression="//a[contains(., 'contact')]/@href">
<var name="new_page_content"/>
</xpath>
</body>
</loop>
</config>