我是DOM解析的新手,但我发现了大部分内容。我只是在移除时遇到问题来自一个div。
这是我的PHP:
function parseDOM($url) {
$dom = new DOMDocument;
@$dom->loadHTMLFile($url);
$xpath = new DOMXPath($dom);
$movies = array();
foreach ($xpath->query('//div[@class="mshow"]') as $movie) {
$item = array();
$links = $xpath->query('.//a', $movie);
$item['trailer'] = $links->item(0)->getAttribute('href');
$item['reviews'] = $links->item(1)->getAttribute('href');
$item['link'] = $links->item(2)->getAttribute('href');
$item['title'] = $links->item(2)->nodeValue;
$item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
$movie)->item(0)->nodeValue);
$i = 0;
foreach ($xpath->query('.//div[@class="rsd"]', $movie) as $date) {
$dates = $xpath->query('.//div[@class="rsd"]', $movie);
$times = $xpath->query('.//div[@class="rst"]', $movie);
$item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;
$i += 1;
}
$movies[] = $item;
}
return $movies;
}
$url = 'http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1';
$movies = parseDOM($url);
foreach ($movies as $key => $value) {
echo $value['title'] . '<br>';
echo $value['link'] . '<br>';
echo $value['rating'] . '<br>';
foreach ($value['datetime'] as $datetime) {
echo $datetime . '<br>';
}
}
这是HTML的样子:
<div class="rst" >6:45pm 9:30pm </div>
我是否可以添加到xpath查询中来实现此目的?我确实尝试将strip_tags添加到$times->item($i)->nodeValue
,但它仍然打印出来像:Thu, May 01: 6:45pm   9:30pm  Â
编辑:str_replace("\xc2\xa0", '', $times->item($i)->nodeValue);
似乎可以解决问题。
答案 0 :(得分:1)
试试这个:
$times->item($i)->nodeValue = str_replace(" ","",$times->item($i)->nodeValue);
它应该删除每个
你的行:
$item['datetime'][] = $dates->item($i)->nodeValue . $times->item($i)->nodeValue;
成为:
$item['datetime'][] = $dates->item($i)->nodeValue
. str_replace(" ","",$times->item($i)->nodeValue);
如果str_replace
不起作用,请按照评论中的建议尝试使用str_ireplace
。
如果仍然无效,您也可以尝试:
preg_replace("# #","",$times->item($i)->nodeValue);
您可能遇到编码问题。见uft8_encode
或小猪解决方案:
str_replace("Â","",$times->item($i)->nodeValue);
阿波罗