找到具有特定类的<p>(段)标记,并使用PHP Simple HTML DOM Parser提取它的内容

时间:2018-05-07 12:23:54

标签: php html parsing web-scraping simple-html-dom

我想在工作经验和教育与培训之间找到类 ft04 的p标签,并从给定的html中提取包含公司名称的类文本

<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft02">Phone Number:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft04">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft04">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>

到目前为止,我可以得到的是提取工作经验和教育与培训之间的所有数据,并且它正常工作,代码如下: -

$fexp = $html->find('p[plaintext^=Work Experience]');
$items = array();
 foreach ($fexp as $keye) {

    while ( $keye->nextSibling() ) {
        if ( $keye->nextSibling() == TRUE ) {

         $keye = $keye->nextSibling();
            $varce = $keye->plaintext;



        }
        if ( trim($varce) == "EDUCATION AND TRAINING" ){
            break;
        }
        //$test[] = $collection;
       $items[] = $varce;
        // echo $varce;

}
}
var_dump($items);

我很接近,但似乎无法找到解决方案,任何帮助将不胜感激谢谢: - )

1 个答案:

答案 0 :(得分:0)

这是工作代码:)

$html = '<p class = "ft00">Introduction</p>
<p class = "ft00">John Smith</p>
<p class = "ft02">Email:</p>
<p class = "ft02">Phone Number:</p>
<p class = "ft00">John@gmail.com</p>
<p class = "ft00">Work Experience</p>
<p class = "ft00">27 July 2017</p>
<p class = "ft04">ABC Company</p>
<p class = "ft00">19 May 2018</p>
<p class ="ft04">XYZ Company</p>
<p class = "ft00">EDUCATION AND TRAINING</p>';

$doc = new DOMDocument();
$doc->loadHTML($html);

$items = array();

foreach ($doc->getElementsByTagName('p') as $p) {
    if (strtolower(trim($p->nodeValue)) == 'work experience') {
        $found = true;
    }
    if (isset($found) && strtolower(trim($p->getAttribute('class'))) == 'ft04') {
        $items[] = $p->nodeValue;
    }
    if (strtolower(trim($p->nodeValue)) == 'education and training') {
        break;
    }
}

print_r($items);

输出

Array
(
    [0] => ABC Company
    [1] => XYZ Company
)

希望它会有所帮助