Question

我有一些HTML代码，其中包含以下内容：

<table class="qprintable2" width="100%" cellpadding="4" cellspacing="0" border="0">
content goes here !
</table>

我有这个功能来匹配

里面的标签

function getTextBetweenTags($string, $tagname)
{
  $pattern = "/<table class=\"class1\" width=\"100%\" cellpadding=\"4\" cellspacing=\"0\" border=\"0\">(.*?)<\/$tagname>/"; 
  preg_match_all($pattern, $string, $matches);
  return $matches[1];
}

但它没有，所以如果你能给我一个很好的模式，我将非常感激：（

Answer 1

你应该避免这个，但你可以使用正则表达式：

preg_match('#<table[^>]+>(.+?)</table>#ims', $str);

这里的各种技巧是：

/ims修饰符，以便“。”还匹配换行符，不区分大小写，多行选项（^和$）
使用#代替/来封装正则表达式，这样您就不必转义html结束标记
使用[^>]+使其不明确并避免列出单个html属性（更可靠）

虽然这是正则表达式可以正常工作的情况，但普遍的共识是你应该使用QueryPath或phpQuery（或类似的）来提取html。它也更简单：

qp($html)->find("table")->text();  //would return just the text content

如何使用preg_match_all获取html标记内容

1 个答案: