在字符串中查找每个模式的出现次数

时间:2013-02-19 15:24:15

标签: php string file preg-match

我想找到一个preg_match结果并检索信息。在那场比赛之后,我需要找到在它之后有多少hmtl标签'a',直到下一场比赛。我把那些html'A'标签的数量和preg_match放在一个数组中。所有这些都在一个文件中,我可以修改文件。

这样做的目的是识别每个html'a'标签并给他一个类别(matche)。 唯一的问题是我验证文件的每一行,我想查看一个简单的字符串。 这是我的代码和文件。

功能

function retrieveCatAndNumber($filepath){

$nbTag = array();
$lines = file($filepath);

//print_r($lines);
$i = 0;
$countA = 0;
$lineNumber = 1;
$nbLines = count($lines);

// Loop through our array, show HTML source as HTML source; and line numbers too.
foreach ($lines as $line_num => $line) {

 // echo htmlspecialchars($line)."<br>";
  if(preg_match("/<a(.*?)<\/a>/s", $line, $matcheA)){
    $countA++;
  }//if A
 // echo "/<td class=pp bgcolor='#0051AB'>(.*?)<\/td>";
 $result = preg_match_all("/<td class=pp bgcolor='#0051AB'>(.*?)<\/td>/s", trim($line), $matcheTD);

  if($result == 1){
    //echo $result;
    $prev = $i - 1;
     //echo htmlspecialchars($line)."<BR>";
     $nbTag[$i][0] = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $matcheTD[0]);                                   

             print_r($matcheTD);
    echo "<br>";
     if($i != 0){
       $nbTag[$prev][1] = $countA;
       $countA = 0;
     }

     $i++;
  }//if TD

  if($lineNumber == $nbLines){
    $nbTag[$i - 1][1] = $countA;
    $countA = 0;
  }
  $lineNumber++;
}//Foreach line of file

echo "\t\t The category and number of links : ok<BR>";

return $nbTag;

} // retrieveNbTag

文件内容

<is_links><!-- !!!!!! Project General Info !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Project Info</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_proj_def.htm'; if ( file_exists ($filename) ){ ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">Project Definition</a> <?php } else { ?> <font color="#A0A0A0">Project Definition</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_kickoff.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">Project Kickoff</a> <?php } else { ?> <font color="#A0A0A0">Project Kickoff</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/logbook/logbook.is?mnem=<?= $mnem ?>">Logbook</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/schedule/schedule.is?mnem=<?= $mnem ?>">Site Schedule</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/projects/pages_v1/assignee/assignee.is?mnem=<?=$mnem?>">Assignee</a> </td> </tr> <tr> <td class=pp> <img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> <a HREF=http://wss.cae.ca/montreal/departments/PMO/PM/2TLM/default.aspx> Sharepoint Site</A></TD></TR> <!-- !!!!!! IS Links !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'> I/S Links</td> </tr> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/ARJ21_FFS_update_S0W.pdf>Project ARJ21_FFS_update_S0W</A></TD></TR><TR><TD class='pp'> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/ARJ21_Update_Schedule_rev4_2Oct2012.pdf>ARJ21_Update_Schedule_rev4_2Oct2012</A></TD></TR> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/2T3F_COMAC_ILC_MOC8_PedestalUpdate_PM_KO.pdf>2T3F COMAC ILC MOC8 PedestalUpdate PM KO</A></TD></TR> <TR><TD class='pp'><A HREF=../../usr2/html/proj_docs/2tlm/FAA_evaluation_S0W.pdf>FAA evaluation S0W</A></TD></TR> <!-- !!!!!! Technical Info !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Technical Info</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="doors://caedrs01.cae.ca:36691/?version=2&prodID=0&urn=urn:telelogic::1-49ac26073662317a-F-000078c5">Requirements </a> </td> <td class=pp> <TR><TD class='pp'><A HREF=/proj_docs/2tlm/BP1122_R2.pdf>Tech Spec</A></TD></TR> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_tpd.pdf'; if ( file_exists ($filename) ) { ?> <TR><TD class='pp'><A HREF=/proj_docs/2tlm/BP1122_R2.pdf>Tech Spec</A></TD></TR> <font color="#A0A0A0">Tech Spec</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_sbdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename ?>">SBDR</a> <?php } else { ?> <font color="#A0A0A0">SBDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_pdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem ?>/<?= $filename?>">PDR</a> <?php } else { ?> <font color="#A0A0A0">PDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif">� <?php $filename = './'.$mnem.'_cdr.pdf'; if ( file_exists ($filename) ) { ?> <a href="http://caevgl03.caecorp.cae.com/projects/<?= $mnem?>/<?= $filename ?>">CDR</a> <?php } else { ?> <font color="#A0A0A0">CDR</font> <?php } ?> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://wss.cae.ca/sites/cpm/System%20Engineering/Public/Forms/AllItems.aspx?RootFolder=%2fsites%2fcpm%2fSystem%20Engineering%2fPublic%2fArchitectures&FolderCTID=&View=%7b5136364F%2d69EF%2d4EC4%2dB754%2d9C4D8C60D8C8%7d/Projects/<?=$mnem?>">Architecture Dwg</a> </td> </tr> <!-- !!!!!! Test And Eval !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Test & Eval</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/elog/elog.asp?PROJECT=<?=$mnem?>">eLOG</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/eqtg2/eqtg2_main.asp?PROJ=<?=$mnem?>">eQTG</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=ATM">eATM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=ITM">eITM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=OTM">eOTM</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=MCD">eMCD</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/edocs/etp_main.asp?PROJECT=<?=$mnem?>&TEST_PHASE=IHA">eTP</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/edocs/edocs.asp?PROJECT=<?=$mnem?>&DOC_TYPE=CHKL">eCHKL</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://testeval.cae.ca/d75/First_Flight/prerequ.asp?mnem=<?= $mnem ?>">FF Pre-requiFites</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com:8080/esnag/">eSnag</a> </td> </tr> <!-- !!!!!! Quick Links !!!!!! --> <tr> <td class=pp bgcolor='#0051AB'>��Quick Links</td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://web.cae.ca/dept-sites/dept48/Life_Cycle/Version%201.6/CAELC.htm">ISO</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Site_Etiquette.ppt">Site Etiquette</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://wss.cae.ca/montreal/departments/ProdAndTech/SIM%20XX1%20Support/Forms/AllItems.aspx?RootFolder=%2fmontreal%2fdepartments%2fProdAndTech%2fSIM%20XX1%20Support%2fCAELIB%20User%20Guides&FolderCTID=&View=%7bBFB10722%2d49AE%2d41DA%2dB20B%2d8A5899366423%7d">CAELib</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://d21app01.cae.ca/webfiles/scm/StarTeamWebPage/index.htm">Starteam Info</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://step1.cae.ca/kb_asp/step1_edt/step1_kb.htm">STEP 1</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://d21app01.cae.ca/webfiles/matrixxweb/mtrx_home.html">MATRIXx</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Site_code.doc">Field Site Code of ethics</a> </td> </tr> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/templates/Cultures.doc">Customer Etiquette</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com">I/S Home</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://caevgl03.caecorp.cae.com/dept62/pm/html/index.html">PM Home</a> </td> </TR> <tr> <td class=pp> �<img src="http://caevgl03.caecorp.cae.com/pc/images/bluedot.gif"> �<a href="http://web.cae.ca">CAE Web</a> </td> </TR> </is_links> 

我获得的结果($nbTag数组)我只得到一个!

Array ( [0] => ��Project Info [1] => ��Project Info ) 

感谢您的建议并随时向我提出任何问题

1 个答案:

答案 0 :(得分:0)

不要尝试用正则表达式解析html文件!使用库来解析html。

您可以使用http://www.php.net/manual/en/domdocument.loadhtml.php或其他一些图书馆......