尝试在PHP的帮助下解析HTML,但没有结果

时间:2014-05-26 18:50:08

标签: php expression

我正在解析特定字符串的HTML文件,但我的结果是空数组。我正在使用正则表达式,这里是 - /(?<=alt\=\")([А-Яа-я]+\s?[А-Яа-я]+)(?=\"\ssrc)/i

我正在寻找西里尔符号。这是我解析的一段代码:

 <div id="s71" style="position:absolute; left:381px; top:95px; height: 9px;">
            <img id="imm4" alt="Алексеевская" src="/img/obana4.gif" style="cursor:hand; cursor:pointer; position: absolute;">
            <div class="border_round" onclick="JavaScript:mSel(4);" style="width: 67px; height: 9px; cursor:hand; cursor:pointer; position: absolute; top: 0px;"><img style="position: absolute;" src="/img/blank.gif" width="67" height="9" alt=""></div>
        </div>
        <div id="s123" style="position:absolute; left:292px; top:45px; height: 9px;">
            <img id="imm5" alt="Алтуфьево" src="/img/obana4.gif" style="cursor:hand; cursor:pointer; position: absolute;">
            <div class="border_round" onclick="JavaScript:mSel(5);" style="width: 54px; height: 9px; cursor:hand; cursor:pointer; position: absolute; top: 0px;"><img style="position: absolute;" src="/img/blank.gif" width="54" height="9" alt=""></div>
        </div>

抱歉,忘了添加PHP代码。以下是我的代码:

//opening and read file which contains code to parse
$hndl = fopen("./metro.html","r");
$size =  filesize("./metro.html");      
$result = fread($hndl,$size);
fclose($hndl);
unset($hndl);


//trying to parse code with expression
$metro_mask = '/(?<=alt\=\")([А-Яа-я]+\s?[А-Яа-я]+)(?=\"\ssrc)/i';
$output = array();
preg_match_all($metro_mask,$result,$output); 
var_dump($output);

它输出了这个:

array (size=2) 0 => array (size=0) empty 1 => array (size=0) empty

我也试过这样做:

$hndl = fopen("./metro.html","r");      
$size =  filesize("./metro.html"); 
$result = fread($hndl,$size);       
fclose($hndl);      
unset($hndl);               

$html = new DOMDocument();          
$html->loadHTML($result);                   
$links = array();               
foreach($html->getElementsByTagName('a') as $link) {            
echo $link->getAttribute('href');       
}  

但结果为null。我哪里错了?

最后!我用正则表达式解决了这个问题。我所需要的只是:

/(?<=alt\=\")\S+?\s?\S+?(?=\")/s

它给了我一个结果:

array (size=1)
  0 => 
    array (size=184)
      0 => string 'Авиамоторная' (length=24)
      1 => string 'Автозаводская' (length=26)
      2 => string 'Академическая' (length=26)
      3 => string 'Александровский сад' (length=37)
      4 => string 'Алексеевская' (length=24)
      5 => string 'Алтуфьево' (length=18)
      6 => string 'Аннино' (length=12)
      7 => string 'Арбатская' (length=18)
      8 => string 'Арбатская' (length=18)

在我活着之前,我想说,就像我们的俄罗斯人说:“терпениеитруд - всеперетрут”))

0 个答案:

没有答案