Question

我在PHP中使用preg_match函数，以便从RSS Feed中提取一些值。在这个Feed内容中有这样的东西：

<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>

我需要获取那些“带有非字母数字字符的文本”和“带有非字母数字字符的更多文本”以将它们保存在数据库中。我不知道使用正则表达式是否是最好的方法。

非常感谢你。

Answer 1

如果你想使用正则表达式（即快速和脏，不太可维护），这将给你文字：

$input = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';

// Match between tags
preg_match("#</strong>(.*?)</li>#", $input, $matches);
// Remove the text inside brackets
echo trim(preg_replace("#\s*\(.*?\)\s*#", '', $matches[1]));

但是，嵌套括号可能会失败。

Answer 2

鉴于结构始终相同，您可以使用此正则表达式

</strong>([^,]*),([^<]*)</li>

组1将具有第一个片段，组2将具有另一个片段

一旦你开始使用正则表达式解析html / xml，很快就会发现一个完整的解析器更适合。对于小型或一次性解决方案，您可以使用正则表达式。

Answer 3

$str = '<li><strong>Something:</strong> A text with non alphanumeric characters (more text), more text with non alphanumeric characters (more text)</li>';
$str = preg_replace('~^.*?</strong>~', '', $str); // Remove leading markup
$str = preg_replace('~</li>$~', '', $str); // Remove trailing markup
$str = preg_replace('~\([^)]++\)~', '', $str); // Remove text within parentheses
$str = trim($str); // Clean up whitespace
$arr = preg_split('~\s*,\s*~', $str); // Split on the comma

在PHP中寻找正则表达式

3 个答案: