我是正则表达式的新手。我想问一下这个html标签的正确组合是什么:
<tr class="calendar_row" data-eventid="39654">
<td class="alt1 eventDate smallfont" align="center"/></td>
<td class="alt1 smallfont" align="center">3:34am</td>
<td class="alt1 smallfont" align="center">CNY</td>
</tr>
我正在使用它:
$html = website html from a url
$match = array();
$pattern = "/(<tr.*?\data-eventid\>.*?<\/tr>)/ims";
preg_match_all($pattern, $html, $match);
但它不起作用:| 我只想选择那个tr元素的所有内容..
最诚挚的问候。
答案 0 :(得分:5)
你不应该在这样的事情上使用正则表达式;而是从您的标记创建一个DOMDocument,然后从该特定元素中选择子项。例如,以下内容将为您提供标记中每个<td>
标记的集体html:
// Our HTML will eventually go here
$innerHTML = "";
// Create a new DOMDocument based on our HTML
$document = new DOMDocument;
$document->loadHTML($html);
// Get a NodeList of all <td> Elements
$cells = $document->getElementsByTagName("td");
// Cycle over each <td>, adding its HTML to $innerHTML
foreach ($cells as $cell) {
$innerHTML .= $document->saveHTML($cell);
}
// Output our glorious HTML
echo $innerHTML;
如果您确实希望使用tr
获取preg_match
代码之间的内容,则以下内容应该有效:
// Our pattern for capturing all that is between <tr> and </tr>
$pattern = "/<tr[^>]*>(.*)<\/tr>/s";
// If a match is found, store the results in $match
if (preg_match($pattern, $html, $match)) {
// Show the captured value
echo $match[1];
}
结果如下:
<td class="alt1 eventDate smallfont" align="center"></td>
<td class="alt1 smallfont" align="center">3:34am</td>
<td class="alt1 smallfont" align="center">CNY</td>