我需要一个正则表达式来提取指定的值,但不起作用。
HTML代码是下一个:
<body style="background: #FFF; padding-left: 5px;">
<form name="form1" method="post" action="verify()" id="form1">
<div>
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="/asdasfafasf/9Q2w==" />
</div>
<div>
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="/wEWAwKb/LCHCALs0bLrBgKM54rGBulKe8VRM9SNhTfqyz0GubMFea7i" />
</div>
<div class="nicer">
<input name="TextBox1" type="text" value="asdf44" id="TextBox1" placeholder="Ingresa tu patente" />
</div>
<p class="sample">
<br /> sample: asdasd34 ó ABCD12
<br /> Para . Ej. AB<strong style="font-weight: bold !importand;">0</strong>123</p>
<p>
<input type="submit" name="Button1" value="Consultar" id="Button1" class="button orange_btn small_btn" />
</p>
<h3><span id="Label1" class="infractions_report">result: asdf44</span></h3>
<div>
<table cellspacing="0" rules="all" border="1" id="GridView1" style="border-collapse:collapse;">
<tr>
<th scope="col">date</th>
<th scope="col">category</th>
<th scope="col">statusok</th>
</tr>
<tr class="txt">
<td>10-08-2015</td>
<td>1</td>
<td>cs nor</td>
</tr>
<tr class="txt">
<td>04-08-2015</td>
<td>1</td>
<td>cs nor2</td>
</tr>
<tr class="txt">
<td>01-08-2015</td>
<td>1</td>
<td>cs nor3</td>
</tr>
<tr class="txt">
<td>30-07-2015</td>
<td>1</td>
<td>cs nor4</td>
</tr>
<tr class="txt">
<td>19-06-2015</td>
<td>1</td>
<td>cn nor5</td>
</tr>
</table>
</div>
</form>
</body>
PHP代码是下一个:
$expresiondate = '/\<tr\>[\s]*\<td class\=\"txt\"\>[\s]*([^\s\<\/]*)/is';
preg_match_all($expresiondate , $buffer, $exit1);
$expresionCategory= '/\-[\d]{4}[\s]*<\/td\>[\s]*\<td class\=\"txt\"\>[\s]*([^\s\<\/]*)/is';
preg_match_all($expresionCategory, $buffer, $exit2);
$expresionstatus= '/\>[\s]*[\d]*[\s]*<\/td\>[\s]*\<td class\=\"txt\"\>[\s]*([^\s\<\/]*)/is';
preg_match_all($expresionstatus, $buffer, $exit3);
我需要的结果是下一个(示例值,但是这个输出):
1. date:
array (
0 =>
array (
0 => '<td align="center">15/01/2016 00:22:16</td>',
1 => '<td align="center">16/01/2016 00:22:16</td>',
2 => '<td align="center">11/01/2015 00:22:16</td>',
),
1 =>
array (
0 => '15/01/2016',
1 => '16/01/2016',
2 => '11/01/2015',
),
)
2. category
array (
0 =>
array (
0 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
1 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
2 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
),
1 =>
array (
0 => '1',
1 => '1',
2 => '1',
),
)
3.status
array (
0 =>
array (
0 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
1 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
2 => '<td>10-08-2015</td><td>1</td><td>cs nor</td>',
),
1 =>
array (
0 => 'cn nor1',
1 => 'cn nor2',
2 => 'cn nor3,
),
)
答案 0 :(得分:0)
正则表达式很难解释..
我建议使用&#34;命名捕获组&#34;从表格单元格中抓取内容。
我想出了以下正则表达式:
$regexp = "/<td>(?P<data>(\d{2}-\d{2}-\d{4}))<\/td>\s+<td>(?P<category>\d{1})<\/td>\s+<td>(?P<status>.*)<\/td>/mi";
preg_match_all($regexp, $input_lines, $matches);
乍一看,这个东西看起来可能很多,但它是由部分构建的。
好的,让我们逐一介绍:
(?P<name_of_the_capturing_group>(regexp))
开头
\s+
(\d{2}-\d{2}-\d{4})
(?P<date>(\d{2}-\d{2}-\d{4}))
\d{1}
(?P<category>\d{1})
.*
(?P<status>.*)
仅preg_match_all
运行var_dump($matches);
后,它应包含日期,类别和状态的键。
永远记住:只有Chuck Norris可以用正则表达式解析HTML。
答案 1 :(得分:0)
再一次,正则表达式不是解析HTML的工具。使用专为。
设计的内置工具DOMDocument
和DOMXPath
$url = 'page.html';
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile($url);
$xp = new DOMXPath($dom);
$rowNodeList = $xp->query('//table[@id="GridView1"]/tr[@class="txt"]');
$results = [];
foreach ($rowNodeList as $rowNode) {
$colNodeList = $rowNode->getElementsByTagName('td');
$results[] = [ 'date' => $colNodeList->item(0)->nodeValue,
'category' => $colNodeList->item(1)->nodeValue,
'status' => $colNodeList->item(2)->nodeValue ];
}
libxml_clear_errors();
print_r($results);