Question

可能重复：
How to parse and process HTML with PHP?

我正在尝试从某些HTML中提取值。这是我试图从中获取值的HTML文档的一部分。

    <input type="hidden" id="first"
        value='&euro;218.33' />
    <input type="hidden" id="second"
        value='&euro;291.08' />
    <input type="hidden" id="third"
        value='&euro;344.77' />

我使用了以下preg match all命令，其中$ buffer包含我正在搜索的页面的整个html。

if (preg_match_all('/<input type="hidden" id="(.+?)" value=\'&euro;(.+?)\'/', $buffer, $matches))
{
   echo "FOUND";
   echo  $matches[2][0] . " " . $matches[2][1] . " " . $matches[2][2] . "\n";
}

此preg match命令未找到任何匹配项。有什么建议？

Answer 1

一个非常简单的解决方案是使用PHP Simple HTML DOM Parser str_get_html

HTML示例

include "simple_html_dom.php" ;

$html =" <input type=\"hidden\" id=\"first\"
    value='&euro;218.33' />
<input type=\"hidden\" id=\"second\"
    value='&euro;291.08' />
<input type=\"hidden\" id=\"third\"
    value='&euro;344.77' />";

用法

$html = str_get_html($html);
foreach($html->find('input') as $element)
    echo $element->value . '\n';

输出

€218.33
€291.08
€344.77

Answer 2

这个正则表达式没有返回任何内容，因为id和值之间有多个空格......

preg_match_all('/<input type="hidden" id="(.+?)"[.\s\t\r\n\v\f]*?value=\'&euro;(.+?)\'/', $buffer, $matches)

注意[。\ s \ t \ r \ n \ v \ f] *？就在值=之前。这将在关闭“id之前和值=之前”之后获取任何字符。这样，空格，制表符，换行符等都不会破坏你的表达。

Answer 3

怎么样？

if (preg_match_all('/<input type="hidden" id="(.+?)".+?value=\'&euro;(.+?)\'/s', $buffer, $matches))

从HTML中提取值

3 个答案: