获取字符串中某个位置的单词

时间:2013-05-05 18:56:59

标签: java string

我想得到字符串中某个位置周围的单词。例如,之前的两个单词和之前的两个单词。

例如,考虑字符串:

String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";

for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
    System.out.println(index);
}

这写出单词“I”所在的索引。但我希望能够得到这些位置周围的单词的子串。

我希望能够打印出“约翰和我喜欢”和“远足我有两个”。

不仅应该能够选择单个字符串。搜索“John and”将返回“姓名是约翰和我喜欢”。

有没有任何干净,聪明的方法呢?

5 个答案:

答案 0 :(得分:11)

单字:

您可以使用String's split() method来实现这一目标。该解决方案 O(n)

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and "+
                         "hiking I have two sisters and one brother.";
    String find = "I";

    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        if (sp[i].equals(find)) {
            // have to check for ArrayIndexOutOfBoundsException
            String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
                          (i-1 > 0 ? sp[i-1]+" " : "") +
                          sp[i] +
                          (i+1 < sp.length ? " "+sp[i+1] : "") +
                          (i+2 < sp.length ? " "+sp[i+2] : "");
            System.out.println(surr);
        }
    }
}

输出:

John and I like to
and hiking I have two

多字:

find是一个多字词时,正则表达式是一个很好的清晰解决方案。但是,由于它的性质,它错过了周围的单词也匹配find 的情况(参见下面的示例)。

以下算法负责所有情况(所有解决方案的空间)。请记住,由于问题的性质,在最坏的情况下,此解决方案是 O(n * m) nstr'长度和mfind的长度)

public static void main(String[] args) {
    String str = "Hello my name is John and John and I like to go...";
    String find = "John and";

    String[] sp = str.split(" +"); // "+" for multiple spaces

    String[] spMulti = find.split(" +"); // "+" for multiple spaces
    for (int i = 2; i < sp.length; i++) {
        int j = 0;
        while (j < spMulti.length && i+j < sp.length 
                                  && sp[i+j].equals(spMulti[j])) {
            j++;
        }           
        if (j == spMulti.length) { // found spMulti entirely
            StringBuilder surr = new StringBuilder();
            if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
            if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
            for (int k = 0; k < spMulti.length; k++) {
                if (k > 0){ surr.append(" "); }
                surr.append(sp[i+k]);
            }
            if (i+spMulti.length < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length]);
            }
            if (i+spMulti.length+1 < sp.length) {
                surr.append(" ");
                surr.append(sp[i+spMulti.length+1]);
            }
            System.out.println(surr.toString());
        }
    }
}

输出:

name is John and John and
John and John and I like

答案 1 :(得分:2)

这是我发现使用正则表达式的另一种方式:

        String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";

        String find = "I";

        Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
        Matcher matcher = pattern.matcher(str);

        while (matcher.find())
        {
            System.out.println(matcher.group(1));
            System.out.println(matcher.group(2));
        }

输出:

John and
like to 
and hiking
have two 

答案 2 :(得分:1)

使用String.split()将文本拆分为单词。然后搜索“I”并将这些单词连接在一起:

String[] parts=str.split(" ");

for (int i=0; i< parts.length; i++){
   if(parts[i].equals("I")){
     String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
   }
}

当然,你需要检查i-2是否是一个有效的索引,如果你有大量的数据,使用StringBuffer会很方便。

答案 3 :(得分:1)

// Convert sentence to ArrayList
String[] stringArray = sentence.split(" ");
List<String> stringList = Arrays.asList(stringArray);

// Which word should be matched?
String toMatch = "I";

// How much words before and after do you want?
int before = 2;
int after = 2;

for (int i = 0; i < stringList.size(); ++i) {
    if (toMatch.equals(stringList.get(i))) {
        int index = i;
        if (0 <= index - before && index + after <= stringList.size()) {
            StringBuilder sb = new StringBuilder();

            for (int i = index - before; i <= index + after; ++i) {
                sb.append(stringList.get(i));
                sb.append(" ");
            }
            String result = sb.toString().trim();
            //Do something with result
        }
    }
}

这会在比赛前后提取两个单词。可以扩展为打印最多之前和之后的两个单词,而不是完全两个单词。

编辑该死的......慢慢的,没有花哨的三元运营商的方式:/

答案 4 :(得分:0)

public static void main(String[] args) {
    String str = "Hello my name is John and I like to go fishing and hiking I have two    sisters and one brother.";
    String find = "I";
    int countWords = 3;
    List<String> strings = countWordsBeforeAndAfter(str, find, countWords);
    strings.stream().forEach(System.out::println);
}

public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){
    List<String> searchList = new ArrayList<>();
    String str = paragraph;
    String find = search;
    int countWords = countWordsBeforeAndAfter;
    String[] sp = str.split(" +"); // "+" for multiple spaces
    for (int i = 0; i < sp.length; i++) {
        if (sp[i].equals(find)) {

            String before = "";
            for (int j = countWords; j > 0; j--) {
                if(i-j >= 0) before += sp[i-j]+" ";
            }

            String after = "";
            for (int j = 1; j <= countWords; j++) {
                if(i+j < sp.length) after += " " + sp[i+j];
            }
            String searhResult = before + find + after;
           searchList.add(searhResult);
        }
    }
    return searchList;
}