无法解码标题中的html实体

时间:2011-05-26 21:38:18

标签: php encoding utf-8 decode html-entities

我无法通过此YouTube视频解码标题中的实体:

http://www.youtube.com/watch?v=p7NMsywVQhY

这是我的代码:

$url = 'http://www.youtube.com/watch?v=p7NMsywVQhY';
$html = @file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);

$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;

//decode the '‪' in the title
$title = html_entity_decode($title,ENT_QUOTES,'UTF-8'); //does not seem to have any effect
//decode the utf data
$title = utf8_decode($title);

$ title返回一切正常,除了返回‪最初位于标题中的问号。

感谢。

2 个答案:

答案 0 :(得分:1)

我不知道PHP是否提供了任何功能,但您可以像这样使用preg_replace

$string = preg_replace('/&#x([0-9a-f]+);/ei', 'chr(hexdec("$1"))', $string);

答案 1 :(得分:0)

尝试此操作以强制正确检测字符集:

$doc = new DOMDocument();
@$doc->loadHTML('<?xml encoding="UTF-8">' . $html);

$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;

echo $title;