拆分字符串不包含java中的字符串

时间:2020-07-01 22:37:10

标签: java arrays regex string split

我该如何在下面使用拆分cretiria拆分此文本:FIRST,NOW,THEN:

String text = "FIRST i go to the homepage NOW i click on button \"NOW CLICK\" very quick THEN i will become a text result.";

预计有3个句子:

  1. 首先我进入首页
  2. 现在我可以非常迅速地单击“立即单击”按钮
  3. 我将成为文本结果。

由于“立即点击”按钮,此代码无法正常工作

String[] textArray = text.split("FIRST|NOW|THEN");

5 个答案:

答案 0 :(得分:4)

如果我对你的理解正确

  • 想要在关键字FIRST NOW THEN上分开文本,并将其保留在结果部分
  • 但如果这些关键字出现在引号中,则不想分开。

如果我的猜测正确,而不是split方法,则可以使用find遍历所有

  • 报价
  • 不在引号内的单词
  • 空白。

这将使您添加所有引号和空格,以得到结果,并只专注于检查不在引号内的单词,以查看是否应拆分它们。

代表这些部分的正则表达式看起来像Pattern.compile("\"[^\"]*\"|\\S+|\\s+");

重要 :我们需要先搜索“ ..”,否则\\S+也将"NOW CLICK"匹配为"NOWCLICK"作为两个独立的部分,这将阻止将其视为单引号。这就是为什么我们要在"[^"]*"系列的开头放置subregex1|subregex2|subregex3正则表达式(代表引号)。

此正则表达式将允许我们遍历文本

FIRST i go to the homepage NOW i click on button "NOW CLICK" very quick THEN i will become a text result.

作为令牌

FIRST i go to the homepage NOW i click on button "NOW CLICK" very quick THEN i will become a text result. THEN i will become a text result.

请注意,"NOW CLICK"将被视为单个令牌。因此,即使它要包含在要拆分的关键字内,也永远不会等于该关键字(因为它将包含其他字符,例如{{1 }},或其他引号)。这样可以防止将其视为应分隔文本的定界符

使用此想法,我们可以创建如下代码:

"

输出:

String text = "FIRST i go to the homepage NOW i click on button \"NOW CLICK\" very quick THEN i will become a text result.";
List<String> keywordsToSplitOn = List.of("FIRST", "NOW", "THEN");

//lets search for quotes ".." | words | whitespaces
Pattern p = Pattern.compile("\"[^\"]*\"|\\S+|\\s+");
Matcher m = p.matcher(text);

StringBuilder sb = new StringBuilder();
List<String> result = new ArrayList<>();
while(m.find()){
    String token = m.group();
    if (keywordsToSplitOn.contains(token) && sb.length() != 0){
        result.add(sb.toString());
        sb.delete(0, sb.length());//clear sb
    }
    sb.append(token);
}
if (sb.length() != 0){//include rest of text after last keyword 
    result.add(sb.toString());
}

result.forEach(System.out::println);

答案 1 :(得分:3)

您需要使用先行和后备(简短介绍here)。

只需将split方法中的正则表达式更改为以下内容即可:

String[] textArray = text.split("((?=FIRST)|(?=NOW(?! CLICK))|(?=THEN))");

甚至最好在每个表达式中都包含一个空格,以防止在例如NOWHERE上分割:

String[] textArray = text.split("((?=FIRST )|(?=NOW (?!CLICK))|(?=THEN ))");

答案 2 :(得分:1)

您可以使用模式和匹配器按组划分输入:

Pattern pattern = Pattern.compile("^(FIRST.*?)(NOW.*?)(THEN.*)$");

String text = "FIRST i go to the homepage NOW i click on button \"NOW CLICK\" very quick THEN i will become a text result.";

Matcher matcher = pattern.matcher(text);
        
if (matcher.find()) {
    System.out.println(matcher.group(1));
    System.out.println(matcher.group(2));
    System.out.println(matcher.group(3));
}

输出:

FIRST i go to the homepage 
NOW i click on button "NOW CLICK" very quick 
THEN i will become a text result.

答案 3 :(得分:1)

您可以匹配以下正则表达式。

/\bFIRST +(?:(?!\bNOW\b)[^\n])+(?<! )|\bNOW +(?:(?!\bTHEN\b)[^\n])+(?<! )|\bTHEN +.*/

Start your engine!

Java的正则表达式引擎执行以下操作。

\bFIRST +      : match 'FIRST' preceded by a word boundary,
                 followed by 1+ spaces
(?:            : begin a non-capture group
  (?!\bNOW\b)  : use a negative lookahead to assert that
                 the following chars are not 'NOW'  
  [^\n]        : match any char other than a line terminator
)              : end non-capture group
+              : execute non-capture group 1+ times
(?<! )         : use negative lookbehind to assert that the
                 previous char is not a space
|              : or
\bNOW +        : match 'NOW' preceded by a word boundary,
                 followed by 1+ spaces
(?:            : begin a non-capture group
  (?!\bTHEN\b) : use a negative lookahead to assert that
                 the following chars are not 'THEN'  
  [^\n]        : match any char other than a line terminator
)              : end non-capture group
+              : execute non-capture group 1+ times
(?<! )         : use negative lookbehind to assert that the
                 previous char is not a space
|              : or
\bTHEN +.*     : match 'THEN' preceded by a word boundary,
                 followed by 1+ spaces then 0+ chars

这使用了一种称为tempered greedy token solution的技术。

答案 4 :(得分:0)

您可以使用以下here):

public static void main(String args[]) { 
    String text = "FIRST i go to the homepage NOW i click on button \"NOW CLICK\" very quick THEN i will become a text result.";
    String[] textArray = text.split("(?=FIRST)|(?=\\b NOW \\b)|(?=THEN)");
    
    for(String s: textArray) {
        System.out.println(s);
    }
}

输出:

FIRST i go to the homepage
 NOW i click on button "NOW CLICK" very quick 
THEN i will become a text result.