Question

我希望编写一个正则表达式，可以删除第一个&emsp之前的任何字符，如果(new section)之后有&emsp，则删除它。但是以下正则表达式似乎不起作用。为什么？我该如何纠正？

String removeEmsp =" &ldquo;[<centd>[</centd>]&sect;&ensp;431:10A&ndash;126&emsp;(new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
Pattern removeEmspPattern1 = Pattern.compile("(.*(&emsp;(\\(new section\\)))?)(.*)", Pattern.MULTILINE);
System.out.println(removeEmspPattern1.matcher(removeEmsp).replaceAll("$2"));

Answer 1

试试这个：

String removeEmsp =" &ldquo;[<centd>[</centd>]&sect;&ensp;431:10A&ndash;126&emsp;(new section)[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.";
System.out.println(removeEmsp.replaceFirst("^.*?\\&emsp;(\\(new\\ssection\\))?", ""));
System.out.println(removeEmsp.replaceAll("^.*?\\&emsp;(\\(new\\ssection\\))?", ""));

输出：

[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.
[<centd>]Chemotherapy services.</centd>] <centa>Cancer treatment.</centa>test snl.

它会将所有内容删除至“＆amp; emsp;”并且可选地，以下“（新部分）”文本（如果有的话）。

Answer 2

你试过String Split吗？这将根据分隔符从字符串创建一个字符串数组。

一旦你有字符串拆分，只需选择print语句所需的数组元素。

Read more here

Answer 3

你的正则表达式很长，我不想调试它。然而，提示是一些字符在正则表达式中具有特殊含义。例如，&表示“和”。 Squire括号允许定义字符组等。如果您希望将它们解释为字符而不是正则表达式命令，则必须转义这些字符。要逃避特殊角色，你必须在它前面写\。但是\也是java的转义字符，所以它应该是重复的。

例如，要用字母A替换＆符号，您应该写str.replaceAll("\\&", "A")

现在您拥有所需的所有信息。尝试从简单的正则表达式开始，然后将其扩展为您需要的。祝好运。

修改的 BTW可以使用正则表达式解析XML和/或HTML，但强烈建议不要这样做。对这种格式使用特殊解析器。

从字符串到某个字符删除所有内容，如果也遵循，也可以选择删除字符串

3 个答案: