拆分字符串以获取字分隔符

时间:2017-01-22 11:40:30

标签: java regex

我想找到句子中所有单词之间的所有分隔符,可以是空格,换行符。

说我有以下字符串:

String text = "hello, darkness   my old friend.\nI've   come to you again\r\nasd\n 123123";

String[] separators = text.split("\\S+");

输出:[, , , , , , , , , , , ]

所以我除了一个空格之外就分裂了它首先返回一个空的分隔符,其余的都很好。为什么第一个空字符串?

另外,我想分一些句号和逗号。但我不知道如何做到这意味着".\n"是一个分隔符。

上述字符串的通缉输出:

 separators = {", ", "   ", " ", " ", ".\n", "   ", " ", " ", " ", "\r\n", "\n "}

 separators = {",", " ", "   ", " ", " ", ".", "\n", "   ", " ", " ", " ", "\r\n", "\n "}

3 个答案:

答案 0 :(得分:0)

我认为这也可以正常运作:

String[] separators = text.split("\\w+");

答案 1 :(得分:0)

试试这个:

String[] separators = text.split("[\\w']+");

这将非分隔符定义为“单词字符”和/或撇号。

这会在结果数组中留下前导空白,除非首先删除前导词,否则无法避免:

String[] separators = text.replaceAll("^[\\w']+", "").split("[\\w']+");

如果您将带连字符的单词(上一句中的示例)视为一个单词,您可以考虑将连字符添加到字符类中,即

String[] separators = text.split("[\\w'-]+");

请参阅live demo

答案 2 :(得分:0)

如果认为使用.find()方法获取所需结果更容易:

String text = "hello, darkness   my old friend.\nI've   come to you again\r\nasd\n 123123";

String pat = "[\\s,.]+"; // add all that you need to the character class
Matcher m = Pattern.compile(pat).matcher(text);

List<String> list = new ArrayList<String>();

while( m.find() ) {
    list.add(m.group());
}

// the result is already stored in "list" but if you
// absolutely want to store the result in an array, just do:

String[] result = list.toArray(new String[0]); 

这样可以避免开头的空字符串问题。