我想找到句子中所有单词之间的所有分隔符,可以是空格,换行符。
说我有以下字符串:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String[] separators = text.split("\\S+");
输出:[, , , , ,
, , , , ,
,
]
所以我除了一个空格之外就分裂了它首先返回一个空的分隔符,其余的都很好。为什么第一个空字符串?
另外,我想分一些句号和逗号。但我不知道如何做到这意味着".\n"
是一个分隔符。
上述字符串的通缉输出:
separators = {", ", " ", " ", " ", ".\n", " ", " ", " ", " ", "\r\n", "\n "}
或
separators = {",", " ", " ", " ", " ", ".", "\n", " ", " ", " ", " ", "\r\n", "\n "}
答案 0 :(得分:0)
我认为这也可以正常运作:
String[] separators = text.split("\\w+");
答案 1 :(得分:0)
试试这个:
String[] separators = text.split("[\\w']+");
这将非分隔符定义为“单词字符”和/或撇号。
这会在结果数组中留下前导空白,除非首先删除前导词,否则无法避免:
String[] separators = text.replaceAll("^[\\w']+", "").split("[\\w']+");
如果您将带连字符的单词(上一句中的示例)视为一个单词,您可以考虑将连字符添加到字符类中,即
String[] separators = text.split("[\\w'-]+");
请参阅live demo。
答案 2 :(得分:0)
如果认为使用.find()
方法获取所需结果更容易:
String text = "hello, darkness my old friend.\nI've come to you again\r\nasd\n 123123";
String pat = "[\\s,.]+"; // add all that you need to the character class
Matcher m = Pattern.compile(pat).matcher(text);
List<String> list = new ArrayList<String>();
while( m.find() ) {
list.add(m.group());
}
// the result is already stored in "list" but if you
// absolutely want to store the result in an array, just do:
String[] result = list.toArray(new String[0]);
这样可以避免开头的空字符串问题。