从html页面我需要从所有锚链接中提取v的值...每个锚链接隐藏在大约5个div标签中
<a href="/watch?v=value to be retrived&list=blabla&feature=plpp_play_all">
每个v值有11个字符,因此截至目前我正在尝试逐字符地读取它,如
<?php
$file=fopen("xx.html","r") or exit("Unable to open file!");
$d='v';
$dd='=';
$vd=array();
while (!feof($file))
{
$f=fgetc($file);
if($f==$d)
{
$ff=fgetc($file);
if ($ff==$dd)
{
$idea='';
for($i=0;$i<=10;$i++)
{
$sData = fgetc($file);
$id=$id.$sData;
}
array_push($vd, $id);
这是获取v的每个字符并将其存储在sData变量中并将其推入id以便将这11个字符作为字符串(id)... 问题是...通过整个html文件搜索'v ='如果发现读取11个字符并将其推入sData数组就是吸吮,这需要花费相当多的时间...所以请帮助我复杂化这些东西
答案 0 :(得分:2)
<?php
function substring(&$string,$start,$end)
{
$pos = strpos(">".$string,$start);
if(! $pos) return "";
$pos--;
$string = substr($string,$pos+strlen($start));
$posend = strpos($string,$end);
$toret = substr($string,0,$posend);
$string = substr($string,$posend);
return $toret;
}
$contents = @file_get_contents("xx.html");
$old="";
$videosArray=array();
while ($old <> $contents)
{
$old = $contents;
$v = substring($contents,"?v=","&");
if($v) $videosArray[] = $v;
}
//$videosArray is array of v's
?>
答案 1 :(得分:0)
我最好使用SimpleXML和XPath解析HTML:
// Get your page HTML string
$html = file_get_contents('xx.html');
// As per comment by Gordon to suppress invalid markup warnings
libxml_use_internal_errors(true);
// Create SimpleXML object
$doc = new DOMDocument();
$doc->strictErrorChecking = false;
$doc->loadHTML($html);
$xml = simplexml_import_dom($doc);
// Find a nodes
$anchors = $xml->xpath('//a[contains(@href, "v=")]');
foreach ($anchors as $a)
{
$href = (string)$a['href'];
$url = parse_url($href);
parse_str($url['query'], $params);
// $params['v'] contains what we need
$vd[] = $params['v']; // push into array
}
// Clear invalid markup error buffer
libxml_clear_errors();