html dom解析器从span兄弟中提取href

时间:2014-04-16 06:28:22

标签: php simple-html-dom

这是我的html文件,其中包含日期以及表格中<span>标记中的链接。 任何人都可以帮我找到特定日期的链接。查看特定日期的链接

<table>
<tbody>
<tr class="c0">
<td class="c11">
<td class="c8">
<ul class="c2 lst-kix_h6z8amo254ry-0 start">
<li class="c1">
<span>1st Apr 2014 - </span>
<span class="c6"><a class="c4" href="/link.html">View</a>
</span>
</li>
</ul>
</td>
</tr>
</td>
</table>

我想检索特定日期的链接

我的代码就像这样

include('simple_html_dom.php');    
$html = file_get_html('link.html');
//store the links in array
foreach($html->find('span') as $value)
{
    //echo $value->plaintext . '<br />';
    $date = $value->plaintext;

    if (strpos($date,$compare_text)) {
         //$linkeachday = $value->find('span[class=c1]')->href;
        //$day_url[] = $value->href;
        //$day_url = Array("text" => $value->plaintext);
        $day_url = Array("text" => $date, "link" =>$linkeachday);
        //echo $value->next_sibling (a);
    }
}

$spans = $html->find('table',0)->find('li')->find('span');
echo $spans;
 $num = null;
 foreach($spans as $span){
     if($span->plaintext == $compare_text){
        $next_span = $span->next_sibling();
        $num = $next_span->plaintext;
         echo($num);    
        break; 
     }
 }
 echo($num);

3 个答案:

答案 0 :(得分:0)

我不知道简单的HTML DOM ,但内置的PHP DOM库应该足够了。

假设你的日期在这样的字符串中......

$date = '1st Apr 2014';

您可以使用XPath表达式轻松找到相应的链接。例如

$doc = new DOMDocument();
$doc->loadHTMLFile('link.html');

$xp = new DOMXpath($doc);
$query = sprintf('//span[starts-with(., "%s")]/following-sibling::span/a', $date);

$links = $xp->query($query);
if ($links->length) {
    $href = $links->item(0)->getAttribute('href');
}

答案 1 :(得分:0)

你的最后一个例子是正确的道路......

我修改了一下以获得以下基本上获得所有跨度,然后测试他们是否有搜索到的文本,如果有,它会显示他们的下一个兄弟的内容(如果有的话)(查看代码注释) :

$input =  <<<_DATA_
    <table>
        <tbody>
            <tr class="c0">
                <td class="c11">
                    <td class="c8">
                        <ul class="c2 lst-kix_h6z8amo254ry-0 start">
                            <li class="c1">
                                <span>1st Apr 2013 - </span>
                                <span>1st Apr 2014 - </span>
                                <span class="c6">
                                    <a class="c4" href="/link.html">View</a>
                                </span>
                                <span>1st Apr 2015 - </span>
                            </li>
                        </ul>
                    </td>
                </td>
            </tr>
        </tbody>
    </table>
_DATA_;

// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($input);

// Searched value
$searchDate = '1st Apr 2014';

// Find all the spans direct childs of li, which is a descendent of table
$spans = $html->find('table li > span');

// Loop through all the spans
foreach ($spans as $span) {
    // If the span starts with the searched text && has a following sibling
    if ( strpos($span->plaintext, $searchDate) === 0 && $sibling = $span->next_sibling()) {
        // Then, print it's text content
        echo $sibling->plaintext;    // or ->innertext for raw content
        // And stop (if only one result is needed)
        break;
    }
}

<强>输出

View

对于字符串比较,您也可以(最好)使用正则表达式...

所以在上面的代码中,你添加它来构建你的模式:

$pattern = sprintf('~^\s*%s~i', preg_quote($searchDate, '~'));

然后使用preg_match来测试匹配:

if ( preg_match($pattern, $span->plaintext) && $sibling = $span->next_sibling()) {

答案 2 :(得分:0)

    include('simple_html_dom.php');

    $html = file_get_html('link.html');
        $compare_text = "1st Apr 2013";


        $tds = $html->find('table',1)->find('span');

        $num = 0;
         foreach($tds as $td){

        if (strpos($td->plaintext, $compare_text) !== false){

                $next_td = $td->next_sibling();
                    foreach($next_td->find('a') as $elm) {
                    $num = $elm->href;
                    }
             //$day_url =   array($day => array(daylink => $day, text => $td->plaintext, link => $num));
echo $td->plaintext. "<br />";
echo $num . "<br />";
             }

         }