PHP 7:XPath - 如何简化此查询?

时间:2018-06-14 11:42:10

标签: php xpath

来自HTML页面:https://www.topazlabs.com/downloads 我想将Windows的Topaz ReMask版本号解压缩为字符串:v5.0.1

  1. 我用curl下载HTML

  2. 我使用查询:

  3. 像这样;

     ->finder->query("//div[contains(@class, 'wpb_wrapper')]/.//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
    
    OR
    
    ...->finder->query("//div[contains(@class, 'wpb_wrapper')]//a[text()[contains(.,'Topaz ReMask')]]/../../../div");
    
    1. 然后我查找所有DIV标签来搜索带有这两个字符串“/”和“(Win)”的标签,如下所示:$ versionString = Find($ nodes,“/”,“(Win) )“);

    2. 我处理文本以仅提取Windows版本。

    3. 它有效但可以简化吗?

      我使用的页面的HTML部分是:

      ...
      <div class="wpb_wrapper">
        <div class="vc_empty_space" style="height: 20px">
          <span class="vc_empty_space_inner">
          </span>
        </div>
        <div id="mpc_textblock-975b2251c2a82c7" class="mpc-textblock mpc-init mpc-typography--preset_2 ">
          <p>
            <a href="/remask" target="blank">Topaz ReMask</a>
          </p>
        </div>
        <div class="mpc-tooltip-wrap" data-id="mpc_textblock-615b2251c2a8c4a">
          <div id="mpc_textblock-615b2251c2a8c4a" class="mpc-textblock mpc-init mpc-typography--preset_0 ">
            <p>
              <em>v5.0.3 (Mac) / v5.0.1 (Win)
              </em>
            </p>
          </div>
          <div id="mpc_tooltip-925b2251c2a8d2f" class="mpc-tooltip mpc-init mpc-typography--preset_4 mpc-position--left mpc-can-hover mpc-trigger--hover ">Mac Updated November 4, 2016
            <br>Windows Updated November 21, 2016
            <div class="mpc-arrow">
            </div>
          </div>
        </div>
        <div id="mpc_textblock-475b2251c2a9601" class="mpc-textblock mpc-init ">
          <p>The quickest and easiest way to mask your photo.
          </p>
        </div>
      </div>
      ...
      

1 个答案:

答案 0 :(得分:0)

那么您可以仅基于文本内容。使用DOMXpath::evaluate(),您可以直接获取字符串:

$document= new DOMDocument();
$document->loadHTML($html);
$xpath = new DOMXpath($document);

$expression = "substring-after(
  //div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')],
  'Windows Updated '
)";

var_dump($xpath->evaluate($expression));

输出:

string(24) "November 21, 2016
      "
Xpath表达式
  • 获取任何带有文本div的{​​{1}}的{​​{1}},...
    p
  • ...以Topaz ReMask开头的文本后代节点...
    //div[contains(.//p, 'Topaz ReMask')]
  • ...并提取Windows Updated之后的文本:
//div[contains(.//p, 'Topaz ReMask')]//text()[starts-with(., 'Windows Updated ')]