Question

我有以下字符串：

blah blah yo<desc>some text with description - unwanted 
text</desc>um hey now some words yah<desc>some other description text 
stuff - more unwanted here</desc>random word and ; things. Now a hyphen 
outside of desc tag - with other text<desc>yet another description - unwanted
<desc>and that's about it.

（注意：实际上字符串中没有换行/回车。为了便于阅读，我只在这里添加了它们。）

我想只从连字符前面选择desc标签中的文本，还包括前面的空格，还包括结束的desc标签。这很简单，因为我刚才这样做了：

\ S - * LT;？\ /降序＆GT;

现在，问题是desc标记之外的连字符也被选中了。所以我的所有选择如下：

- unwanted text</desc>
- more unwanted here</desc>
- with other text<desc>yet another description - unwanted</desc>

所以前两个是完美的，但是看看最后一行是如何搞乱的，因为 - 在desc标签之外？

仅供参考，如果有兴趣，在我的代码中，我正在做这样的替换：

$text = preg_replace('/\s-.*?<\/desc>/', '</desc>', $text);

我尝试过做一些Lookbehind的东西，但无法让它工作。

有什么想法吗？

谢谢！标记

Answer 1

怎么样：

\s-[^-]*?<\/desc>

Answer 2

如果desc是可以出现在此块中的唯一标记，则可以使用这样的可怕黑客：

$text = preg_replace('/\s-[^<]*?<\/desc>/', '</desc>', $text);

但是，如果这需要防弹，则无法使用正则表达式可靠地执行此操作。您可以尝试使用XML解析器并处理生成的DOM。

Answer 3

您可以尝试使用[^-<>]*代替.*?。这限制了正则表达式可以选择的内容，并有效地将尖括号和连字符视为标记。

RegEx选择超过我想要的（PHP）

3 个答案: