需要帮助从Div中删除空白

时间:2014-04-25 15:39:36

标签: php parsing dom xpath nodes

我是DOM解析的新手,但我发现了大部分内容。我只是在移除时遇到问题来自一个div。

这是我的PHP:

    function parseDOM($url) {
        $dom = new DOMDocument;
        @$dom->loadHTMLFile($url);
        $xpath = new DOMXPath($dom);
        $movies = array();
        foreach ($xpath->query('//div[@class="mshow"]') as $movie) {
            $item = array();
            $links = $xpath->query('.//a', $movie);
            $item['trailer'] = $links->item(0)->getAttribute('href');
            $item['reviews'] = $links->item(1)->getAttribute('href');
            $item['link'] = $links->item(2)->getAttribute('href');
            $item['title'] = $links->item(2)->nodeValue;
            $item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
                $movie)->item(0)->nodeValue);
            $i = 0;
            foreach ($xpath->query('.//div[@class="rsd"]', $movie) as $date) {
                $dates = $xpath->query('.//div[@class="rsd"]', $movie);
                $times = $xpath->query('.//div[@class="rst"]', $movie);
                $item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;
                $i += 1;
            }
            $movies[] = $item;
        }
        return $movies;
    }

    $url = 'http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1';
    $movies = parseDOM($url);
    foreach ($movies as $key => $value) {
        echo $value['title'] . '<br>';
        echo $value['link'] . '<br>';
        echo $value['rating'] . '<br>';
        foreach ($value['datetime'] as $datetime) {
            echo $datetime . '<br>';
        }                     
    }                 

这是HTML的样子:

    <div class="rst" >6:45pm &nbsp;&nbsp;9:30pm &nbsp;&nbsp;</div>

我是否可以添加到xpath查询中来实现此目的?我确实尝试将strip_tags添加到$times->item($i)->nodeValue,但它仍然打印出来像:Thu, May 01: 6:45pm   9:30pm  Â

编辑:str_replace("\xc2\xa0", '', $times->item($i)->nodeValue);似乎可以解决问题。

1 个答案:

答案 0 :(得分:1)

试试这个:

$times->item($i)->nodeValue = str_replace("&nbsp;","",$times->item($i)->nodeValue);

它应该删除每个&nbsp;


修改

你的行:

$item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;

成为:

$item['datetime'][] = $dates->item($i)->nodeValue 
                        . str_replace("&nbsp;","",$times->item($i)->nodeValue);

编辑2

如果str_replace不起作用,请按照评论中的建议尝试使用str_ireplace

如果仍然无效,您也可以尝试:

preg_replace("#&nbsp;#","",$times->item($i)->nodeValue);

编辑3

您可能遇到编码问题。见uft8_encode

或小猪解决方案:

str_replace("Â","",$times->item($i)->nodeValue);

阿波罗