我想在另一个网站上找到一个字符串。我一直在寻找解析器,我不知道最好的方法。我查看了一个HTML DOM解析器,但我只需要一个简单的一行输出。我只想将“url:'http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06'”链接到变量。
<script>
flowplayer("player", "http://www.example.com/flowplayer-3.2.16.swf", {
canvas: {
backgroundGradient: "none",
backgroundColor: "#000000"
},
clip: {
provider: 'lighttpd',
url: 'http://s1.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06',
scaling: 'fit'
},
plugins: {
lighttpd: {
url: 'http://www.example.com/flowplayer.pseudostreaming-3.2.12.swf'
}
}
});
</script>
答案 0 :(得分:0)
这是一个方便的函数,用于从两个分隔符之间抓取文本;
<?php
function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, 0, $second_pos);
$unit = trim($str_three); // remove whitespaces
return $unit;
}
echo extract_unit($webpageSource, 'flowplayer("player", "', '", {');
?>
答案 1 :(得分:0)
我会使用DOMDocument
:
为了从锚点获取链接,它是:
$dd = new DOMDocument;
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06');
if($a = $dd->getElementsByTagName('a')){
foreach($a as $t){
$links[] = $t->getAttribute('href');
}
}
现在$links
是一个数组,每个href
,或if(!isset($links))
没有结果。
从脚本标记获取JSON:
$dd = new DOMDocument;
@$dd->loadHTMLFile('http://s2.example.com/streams/i23374.mp4?k=12f34588cf171f3bbf3d35da4db43b06');
if($s = $dd->getElementsByTagName('script')){
$c = $dd->sameHTML($s->item(0)));
}
将item(0)
更改为其页面上script
标记的级别。现在$c
是一个字符串。所以:
preg_match_all("/url: '.+'/", $c, $results);
$results
是一个数组应该包含url: 'whatever'
。
所以:
foreach($results as $v){
$a[] = preg_replace('/url: /', '', $v);
}
$a
是结果数组。
答案 2 :(得分:0)
主要是RegExp是解析字符串的最佳方法,虽然不建议它处理JSON。
这是一个例子(我编码了字符串,它与原始HTML相同):
<?php
$data = base64_decode("PHNjcmlwdD4KICAgICAgICAgICAgICAgIGZsb3dwbGF5ZXIoInBsYXllciIsICJodHRwOi8vd3d3LmV4YW1wbGUuY29tL2Zsb3dwbGF5ZXItMy4yLjE2LnN3ZiIsICB7CiAgICAgICAgICAgICAgICAgICAgY2FudmFzOiB7CiAgICAgICAgICAgICAgICAgICAgICAgIGJhY2tncm91bmRHcmFkaWVudDogIm5vbmUiLAogICAgICAgICAgICAgICAgICAgICAgICBiYWNrZ3JvdW5kQ29sb3I6ICIjMDAwMDAwIgogICAgICAgICAgICAgICAgICAgIH0sCiAgICAgICAgICAgICAgICAgICAgY2xpcDogewogICAgICAgICAgICAgICAgICAgICAgICBwcm92aWRlcjogJ2xpZ2h0dHBkJywKICAgICAgICAgICAgICAgICAgICAgICAgdXJsOiAnaHR0cDovL3MxLmV4YW1wbGUuY29tL3N0cmVhbXMvaTIzMzc0Lm1wND9rPTEyZjM0NTg4Y2YxNzFmM2JiZjNkMzVkYTRkYjQzYjA2JywKICAgICAgICAgICAgICAgICAgICAgICAgc2NhbGluZzogJ2ZpdCcKICAgICAgICAgICAgICAgICAgICB9LAogICAgICAgICAgICAgICAgICAgIHBsdWdpbnM6IHsKICAgICAgICAgICAgICAgICAgICAgICAgbGlnaHR0cGQ6IHsKICAgICAgICAgICAgICAgICAgICAgICAgICAgIHVybDogJ2h0dHA6Ly93d3cuZXhhbXBsZS5jb20vZmxvd3BsYXllci5wc2V1ZG9zdHJlYW1pbmctMy4yLjEyLnN3ZicKICAgICAgICAgICAgICAgICAgICAgICAgfQogICAgICAgICAgICAgICAgICAgIH0KICAgICAgICAgICAgICAgIH0pOwogICAgICAgICAgICA8L3NjcmlwdD4=");
if(preg_match('/clip:\s*\{[\s\S]+url:\s*\'(\S+)\',\s*scaling/', $data, $match) === 1)
echo $match[1];
?>
虽然它是用JSON编码的,但它不能被PHP的json_decode
解析,因为PHP的JSON格式太严格(属性应该用引号括起来)。