PS:我不能将DOM等用于此代码,因为Xpath不适用于html代码,后者包含来自管理不善的网站的大量错误。这对我来说是最简单的方法。
我从错误的HTML代码中得到以下html代码段:
<td width="11%">Train Number</Td>
<td width="16%">Train Name</td>
<td width="18%">Boarding Date <br>(DD-MM-YYYY)</td>
<td width="7%">From</Td>
<td width="7%">To</Td>
<td width="14%">Reserved Upto</Td>
<td width="21%">Boarding Point</Td>
<td width="6%">Class</Td>
</TR>
<TR>
<TD class="table_border_both">*12018</TD>
<TD class="table_border_both">DEHRADUN SHTBDI</TD>
<TD class="table_border_both"> 9- 9-2012</TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both"> CC</TD>
我想使用正则表达式获取最后8个TD中的值。因此,如果我将它放在heredoc中,它就不匹配了。 我应该如何放置在heredoc中以便这个模式(按原样)匹配?
我想这样做:
$trainpattern = <<<EOT
<td width="11%">Train Number</Td>
<td width="16%">Train Name</td>
<td width="18%">Boarding Date <br>[(]DD-MM-YYYY[)]</td>
<td width="7%">From</Td>
<td width="7%">To</Td>
<td width="14%">Reserved Upto</Td>
<td width="21%">Boarding Point</Td>
<td width="6%">Class</Td>
</TR>
<TR>
<TD class="table_border_both">[*]12018</TD>
<TD class="table_border_both">DEHRADUN SHTBDI</TD>
<TD class="table_border_both"> 9- 9-2012</TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both"> CC</TD>
EOT;
$ret = preg_match("#$trainpattern#s",$filetext,$matches);
此外,当我只使用前两行并用\ s +将它们连接成单行时,它匹配但是我正在寻找匹配线的方法而不加入它们。可能在这种情况下,我需要将\ n \ r \ n替换为\ s *。
答案 0 :(得分:2)
要提取值,您可以使用类似的东西:
<?php
// Note: I add <TR></TR> to match
$trainpattern = <<< EOT
<TR>
<td width="11%">Train Number</Td>
<td width="16%">Train Name</td>
<td width="18%">Boarding Date <br>(DD-MM-YYYY)</td>
<td width="7%">From</Td>
<td width="7%">To</Td>
<td width="14%">Reserved Upto</Td>
<td width="21%">Boarding Point</Td>
<td width="6%">Class</Td>
</TR>
<TR>
<TD class="table_border_both">[*]12018</TD>
<TD class="table_border_both">DEHRADUN SHTBDI</TD>
<TD class="table_border_both"> 9- 9-2012</TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">RK </TD>
<TD class="table_border_both">DDN </TD>
<TD class="table_border_both"> CC</TD>
</TR>
EOT;
// $trs will contains each TRs
$trs=array();
preg_match_all("|<tr>(.+)</tr>|Uis", $trainpattern, $trs);
// $keys will contains TD's value of first TR
preg_match_all("|<td.*>(.+)</td>|Uis", $trs[1][0], $keys);
// $values will contains TD's value of second TR
preg_match_all("|<td.*>(.+)</td>|Uis", $trs[1][1], $values);
// We join keys and values
$results = array();
foreach ($keys[1] as $index => $key) {
if (isset($values[1][$index])) {
$results[$key] = $values[1][$index];
}
}
var_dump($results);
这将告诉你:
array(8) {
["Train Number"]=>
string(8) "[*]12018"
["Train Name"]=>
string(15) "DEHRADUN SHTBDI"
["Boarding Date <br>(DD-MM-YYYY)"]=>
string(10) " 9- 9-2012"
["From"]=>
string(4) "DDN "
["To"]=>
string(4) "RK "
["Reserved Upto"]=>
string(4) "RK "
["Boarding Point"]=>
string(4) "DDN "
["Class"]=>
string(3) " CC"
}
答案 1 :(得分:1)
你试过phpQuery吗?如果你曾经使用过jQuery,这不会有问题。
示例:
require 'phpQuery.php';
phpQuery::newDocumentHTML($trainpattern);
foreach (pq('td')->slice(-8) as $v) {
$v = pq($v);
var_dump((string)$v);
var_dump((string)$v->attr('class'));
# etc...
}
输出:
string(43) "[*]12018"
string(50) "DEHRADUN SHTBDI"
string(45) " 9- 9-2012"
string(39) "DDN "
string(39) "RK "
string(39) "RK "
string(39) "DDN "
string(38) " CC"