我正在阅读一个维基百科XML文件,我必须删除任何列表项。例如。对于以下字符串:
String text = ": definition list\n
** some list item\n
# another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";
在这里,我想要删除*
,#
和:
,而不是删除类别的那个。输出应如下所示:
String outtext = "definition list\n
some list item\n
another list item\n
[[Category:1918 births]]\n
[[Category:2005 deaths]]\n
[[Category:Scottish female singers]]\n
[[Category:Billy Cotton Band Show]]\n
[[Category:Deaths from Alzheimer's disease]]\n
[[Category:People from Glasgow]]";
我使用以下代码:
Pattern pattern = Pattern.compile("(^\\*+|#+|;|:)(.+)$");
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
String outtext = matcher.group(0);
outtext = outtext.replaceAll("(^\\*+|#+|;|:)\\s", "");
return(outtext);
}
这不起作用。你能说明我应该怎么做吗?
答案 0 :(得分:0)
这应该有效:
text = text.replaceAll("(?m)^[*:#]+\\s*", "");
重要的是在此使用(?m)
MULTILINE
模式,可让您为每一行使用行开始/结束锚点。
<强>输出:强>
definition list
some list item
another list item
[[Category:1918 births]]
[[Category:2005 deaths]]
[[Category:Scottish female singers]]
[[Category:Billy Cotton Band Show]]
[[Category:Deaths from Alzheimer's disease]]
[[Category:People from Glasgow]]