来自HTML页面:https://www.topazlabs.com/downloads 我想将Windows的Topaz ReMask版本号解压缩为字符串:v5.0.1
我用curl下载HTML
我使用查询:
->finder->query("//div[contains(@class, 'wpb_wrapper')]/.//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
OR
...->finder->query("//div[contains(@class, 'wpb_wrapper')]//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
然后我查找所有DIV标签来搜索带有这两个字符串“/”和“(Win)”的标签,如下所示:$ versionString = Find($ nodes,“/”,“(Win) )“);
我处理文本以仅提取Windows版本。
它有效但可以简化吗?
我使用的页面的HTML部分是:
...
<div class="wpb_wrapper">
<div class="vc_empty_space" style="height: 20px">
<span class="vc_empty_space_inner">
</span>
</div>
<div id="mpc_textblock-975b2251c2a82c7" class="mpc-textblock mpc-init mpc-typography--preset_2 ">
<p>
<a href="/remask" target="blank">Topaz ReMask</a>
</p>
</div>
<div class="mpc-tooltip-wrap" data-id="mpc_textblock-615b2251c2a8c4a">
<div id="mpc_textblock-615b2251c2a8c4a" class="mpc-textblock mpc-init mpc-typography--preset_0 ">
<p>
<em>v5.0.3 (Mac) / v5.0.1 (Win)
</em>
</p>
</div>
<div id="mpc_tooltip-925b2251c2a8d2f" class="mpc-tooltip mpc-init mpc-typography--preset_4 mpc-position--left mpc-can-hover mpc-trigger--hover ">Mac Updated November 4, 2016
<br>Windows Updated November 21, 2016
<div class="mpc-arrow">
</div>
</div>
</div>
<div id="mpc_textblock-475b2251c2a9601" class="mpc-textblock mpc-init ">
<p>The quickest and easiest way to mask your photo.
</p>
</div>
</div>
...
答案 0 :(得分:0)
那么您可以仅基于文本内容。使用DOMXpath::evaluate()
,您可以直接获取字符串:
$document= new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);
$expression = "substring-after(
//div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
'Windows Updated '
)";
var_dump($xpath->evaluate($expression));
输出:
string(24) "November 21, 2016
"
Xpath表达式
div
的{{1}}的{{1}},... p
Topaz ReMask
开头的文本后代节点... //div[contains(.//p, 'Topaz ReMask')]
Windows Updated
之后的文本://div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')]