如何在保留包含空格的复合表达式的同时拆分单词中的句子?

时间:2016-10-09 06:58:45

标签: java string split

我需要在空格上拆分一个String,但是我需要忽略一些包含空格的复合关键字。例如,我有一个String,如下所示,

CREATE TABLE FINAL AS 
SELECT tt.* 
FROM (
    SELECT t.user,t.period,AVG(t.score) as score
    FROM STEP0 t
    GROUP BY t.user,t.period ) tt
WHERE NOT EXISTS (SELECT 1 FROM STEP0 s
                  WHERE s.period = tt.period
                  GROUP BY s.user
                  HAVING AVG(s.score) > tt.score)

我需要拆分字符串,但在

之后需要String testCase = "The patient is currently being treated for Diabetes with Thiazide diuretics"; 作为整个复合表达式
Thiazide diuretics

结果必须如下:

String[] array = testCase.split(" ");

怎么做?

3 个答案:

答案 0 :(得分:5)

在这种情况下,您需要直接处理正则表达式,.split()不适合您的目的。

String s = "The patient is currently being treated for Diabetes with Thiazide diuretics";

Matcher m = Pattern.compile("\\b(?:Thiazide diuretics)\\b|\\S+").matcher(s);
ArrayList<String> result = new ArrayList<>();
while (m.find()) {
    result.add(m.group());
}
System.out.println(result);
// [The, patient, is, currently, being, treated, for, Diabetes, with, Thiazide diuretics]

注意:从技术上讲,可以使用.split()使用lookarounds来执行此操作:

String s = "Thiazide not-a-keyword diuretics and Thiazide diuretics keyword";

String[] result = s.split("(?<!Thiazide) | (?!diuretics)");
System.out.println(Arrays.toString(result));
// [Thiazide, not-a-keyword, diuretics, and, Thiazide diuretics, keyword]

但是当你有更多的关键词时,这并没有扩展。尽量避免这种情况。

答案 1 :(得分:0)

类似的东西:

String[] splits(String source,String drugName) {
    int pos=source.indexOf(drugName);
    if (pos!=-1) {
        String[] internal=source.substring(0,pos).split(" ");
        String[] rest=splits(source.substring(pos+drugName.length()).trim(),drugName);
        String[] result=new String[internal.length+rest.length+1];
        System.arraycopy(internal,0,result,0,internal.length);
        result[internal.length]=drugName;
        System.arraycopy(rest,0,result,internal.length+1,rest.length);
        return result;
    }
    return source.split(" ");
}

答案 2 :(得分:-2)

你可以尝试使用一些正则表达式,比如

static String[] mysplit(String str) {
    Pattern p = Pattern.compile("(?<!Thiazide) | (?!diuretics)");
    return p.split(str);
}