Java - 查找字符串中给定单词之前和之后的单词

时间:2017-05-26 22:40:24

标签: java regex

说我有一个字符串

String str = "This problem sucks and is hard"

我希望得到"问题"之前和之后的话,所以"这"并且"糟糕"。正则表达式是实现这一目标的最佳方法(请记住我是正则表达式的初学者),或者Java是否有某种类型的库(即StringUtils)可以为我完成此任务?

3 个答案:

答案 0 :(得分:0)

apache有一个StringUtils库,它在字符串之前和之后都有子串的方法。此外,还有java自己的子字符串,您可以使用它来获得所需的内容。

Apache StringUtils库API: https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html

您可能需要的方法--sstringBefore()和substringBefore()。

https://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/StringUtils.html#substringBefore(java.lang.String,%20java.lang.String)

如果你想探索java自己的api,请查看这个 Java: Getting a substring from a string starting after a particular character

答案 1 :(得分:0)

要查找给定单词之前和之后的单词,可以使用此正则表达式:

(\w+)\W+problem\W+(\w+)

捕获组是您正在寻找的单词。

在Java中,那将是:

Pattern p = Pattern.compile("(\\w+)\\W+problem\\W+(\\w+)");

Matcher m = p.matcher("This problem sucks and is hard");
if (m.find())
    System.out.printf("'%s', '%s'", m.group(1), m.group(2));

输出

'This', 'sucks'

如果您需要完整的Unicode支持,请添加标记UNICODE_CHARACTER_CLASS或内联为(?U)

Pattern p = Pattern.compile("(?U)(\\w+)\\W+problema\\W+(\\w+)");

Matcher m = p.matcher("Questo problema è schifoso e dura");
if (m.find())
    System.out.printf("'%s', '%s'", m.group(1), m.group(2));

输出

'Questo', 'è'

要查找多个匹配项,请使用while循环:

Pattern p = Pattern.compile("(?U)(\\w+)\\W+problems\\W+(\\w+)");

Matcher m = p.matcher("Big problems or small problems, they are all just problems, man!");
while (m.find())
    System.out.printf("'%s', '%s'%n", m.group(1), m.group(2));

输出

'Big', 'or'
'small', 'they'
'just', 'man'

注意:\W+的使用允许在单词之间出现符号,例如"No(!) problem here"仍会找到"No""here"

另请注意,数字被视为单词:"I found 1 problem here"会返回"1""here"

答案 2 :(得分:0)

有点冗长,但这可以准确,快速地完成工作:

import java.io.*;
import java.util.*;
public class HelloWorld{

public static void main(String []args){
    String EntireString="Hello World this is a test";
    String SearchWord="World";
    System.out.println(getPreviousWordFromString(EntireString,SearchWord));
}
 
public static String getPreviousWordFromString(String EntireString, String SearchWord) {
    List<Integer> IndicesOfWords = new ArrayList();

    boolean isWord = false;

    int indexOfSearchWord=-1;

    if(EntireString.indexOf(SearchWord)!=-1) {
        indexOfSearchWord = EntireString.indexOf(SearchWord)-1;
    } else {
        System.out.println("ERROR: SearchWord passed (2nd arg) does not exist in string EntireString. EntireString: "+EntireString+" SearchWord: "+SearchWord);
        return "";
    }
    
    if(EntireString.indexOf(SearchWord)==0) {
        System.out.println("ERROR: The search word passed is the first word in the search string, so there are no words before it.");
        return "";
    }

    for (int i = 0; i < EntireString.length(); i++) {
        if (Character.isLetter(EntireString.charAt(i)) && i != indexOfSearchWord) {
            isWord = true;                                    
        } else if (!Character.isLetter(EntireString.charAt(i)) && isWord) {
            IndicesOfWords.add(i);
            isWord = false;
        } else if (Character.isLetter(EntireString.charAt(i)) && i == indexOfSearchWord) {
            IndicesOfWords.add(i);
        }
    }
    
    if(IndicesOfWords.size()>0) {
        boolean isFirstWordAWord=true;
        for (int i = 0; i < IndicesOfWords.get(0); i++) {
            if(!Character.isLetter(EntireString.charAt(i))) {
                isFirstWordAWord=false;
            }
        }
        if(isFirstWordAWord==true) {
            String firstWord = EntireString.substring(0,IndicesOfWords.get(0));
            IndicesOfWords.add(0,0);
        }
    } else {
        return "";
    }
    
    String ResultingWord = "";


    for (int i = IndicesOfWords.size()-1; i >= 0; i--) {

        if (EntireString.substring(IndicesOfWords.get(i)).contains(SearchWord)) { 
            if (i > 0) {
                ResultingWord=EntireString.substring(IndicesOfWords.get(i-1),IndicesOfWords.get(i));
                break;
            }
            if (i==0) {
                ResultingWord=EntireString.substring(IndicesOfWords.get(0),IndicesOfWords.get(1));
            }
        }
    }

    return ResultingWord;
}