PHP - 从字符串中提取值

时间:2011-06-10 12:54:03

标签: php

我有几个字符串是使用其他网站的cURL提取的。字符串本身包含整个页面的HTML结构,但是在每个页面中都有一个段落,如下所示:

<p>Displaying 1-15 of 15 items beginning with A</p>
<p>Displaying 1-20 of 33 items beginning with B</p>

我需要做的只是从上述字符串中提取总值(1533)。

我不确定提取值的最佳方法是什么。

谢谢:)

3 个答案:

答案 0 :(得分:7)

蛮力方法:

http://php.net/manual/en/function.preg-match-all.php

preg_match_all('/<p>Displaying (\d+)-(\d+) of (\d+) items beginning with ([A-Z]+)</p>/', $subject, $matches);

答案 1 :(得分:5)

创建正则表达式;

$regex = "/Displaying 1-([0-9]+) of ([0-9]+) items begginning with/";
preg_match($regex,$resultfromcurl,$match);

这样的东西?

答案 2 :(得分:1)

可能是一天晚了,一美元短,但这是我的2美分:这将解析文件中的html,抓取段落,找到匹配,并将所有相关值抛出到一个数组中供以后使用

<?php

// Open your document
$doc = new DOMDocument();

// Parse the HTML
$doc->loadHTMLFile("html_doc.html");

// Find the paragraphs and loop through them
$paras = $doc->getElementsByTagName('p');

// Initialize value array
$range = array(); 

// Extract the value and put them in a useful data structure
for ($i = 0; $i < $paras->length; $i++) {
    $subject = $paras->item($i)->nodeValue;
    preg_match('/Displaying (\d+)-(\d+) of (\d+) items beginning with ([A-Z]+)/', $subject, $matches);
    $range[$matches[4]] = array(
        'start' => $matches[1],
        'stop'  => $matches[2],
        'total' => $matches[3]
    );
}

foreach ($range as $begin => $values) {
    echo "\n$begin\n";
    echo "start: " . $values['start'] . "\n";
    echo "stop: " . $values['stop'] . "\n";
    echo "total: " . $values['total'] . "\n";
    echo "------\n";
}