我无法通过此YouTube视频解码标题中的实体:
http://www.youtube.com/watch?v=p7NMsywVQhY
这是我的代码:
$url = 'http://www.youtube.com/watch?v=p7NMsywVQhY';
$html = @file_get_contents($url);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
//decode the '‪' in the title
$title = html_entity_decode($title,ENT_QUOTES,'UTF-8'); //does not seem to have any effect
//decode the utf data
$title = utf8_decode($title);
$ title返回一切正常,除了返回‪
最初位于标题中的问号。
感谢。
答案 0 :(得分:1)
我不知道PHP是否提供了任何功能,但您可以像这样使用preg_replace
:
$string = preg_replace('/&#x([0-9a-f]+);/ei', 'chr(hexdec("$1"))', $string);
答案 1 :(得分:0)
尝试此操作以强制正确检测字符集:
$doc = new DOMDocument();
@$doc->loadHTML('<?xml encoding="UTF-8">' . $html);
$nodes = $doc->getElementsByTagName('title');
$title = $nodes->item(0)->nodeValue;
echo $title;