我想得到字符串中某个位置周围的单词。例如,之前的两个单词和之前的两个单词。
例如,考虑字符串:
String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";
for (int index = str.indexOf("I"); index >= 0; index = str.indexOf("I", index + 1))
{
System.out.println(index);
}
这写出单词“I”所在的索引。但我希望能够得到这些位置周围的单词的子串。
我希望能够打印出“约翰和我喜欢”和“远足我有两个”。
不仅应该能够选择单个字符串。搜索“John and”将返回“姓名是约翰和我喜欢”。
有没有任何干净,聪明的方法呢?
答案 0 :(得分:11)
您可以使用String
's split()
method来实现这一目标。该解决方案 O(n)。
public static void main(String[] args) {
String str = "Hello my name is John and I like to go fishing and "+
"hiking I have two sisters and one brother.";
String find = "I";
String[] sp = str.split(" +"); // "+" for multiple spaces
for (int i = 2; i < sp.length; i++) {
if (sp[i].equals(find)) {
// have to check for ArrayIndexOutOfBoundsException
String surr = (i-2 > 0 ? sp[i-2]+" " : "") +
(i-1 > 0 ? sp[i-1]+" " : "") +
sp[i] +
(i+1 < sp.length ? " "+sp[i+1] : "") +
(i+2 < sp.length ? " "+sp[i+2] : "");
System.out.println(surr);
}
}
}
输出:
John and I like to
and hiking I have two
当find
是一个多字词时,正则表达式是一个很好的清晰解决方案。但是,由于它的性质,它错过了周围的单词也匹配find
的情况(参见下面的示例)。
以下算法负责所有情况(所有解决方案的空间)。请记住,由于问题的性质,在最坏的情况下,此解决方案是 O(n * m) (n
为str
'长度和m
为find
的长度)。
public static void main(String[] args) {
String str = "Hello my name is John and John and I like to go...";
String find = "John and";
String[] sp = str.split(" +"); // "+" for multiple spaces
String[] spMulti = find.split(" +"); // "+" for multiple spaces
for (int i = 2; i < sp.length; i++) {
int j = 0;
while (j < spMulti.length && i+j < sp.length
&& sp[i+j].equals(spMulti[j])) {
j++;
}
if (j == spMulti.length) { // found spMulti entirely
StringBuilder surr = new StringBuilder();
if (i-2 > 0){ surr.append(sp[i-2]); surr.append(" "); }
if (i-1 > 0){ surr.append(sp[i-1]); surr.append(" "); }
for (int k = 0; k < spMulti.length; k++) {
if (k > 0){ surr.append(" "); }
surr.append(sp[i+k]);
}
if (i+spMulti.length < sp.length) {
surr.append(" ");
surr.append(sp[i+spMulti.length]);
}
if (i+spMulti.length+1 < sp.length) {
surr.append(" ");
surr.append(sp[i+spMulti.length+1]);
}
System.out.println(surr.toString());
}
}
}
输出:
name is John and John and
John and John and I like
答案 1 :(得分:2)
这是我发现使用正则表达式的另一种方式:
String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";
Pattern pattern = Pattern.compile("([^\\s]+\\s+[^\\s]+)\\s+"+find+"\\s+([^\\s]+\\s[^\\s]+\\s+)");
Matcher matcher = pattern.matcher(str);
while (matcher.find())
{
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
输出:
John and
like to
and hiking
have two
答案 2 :(得分:1)
使用String.split()将文本拆分为单词。然后搜索“I”并将这些单词连接在一起:
String[] parts=str.split(" ");
for (int i=0; i< parts.length; i++){
if(parts[i].equals("I")){
String out= parts[i-2]+" "+parts[i-1]+ " "+ parts[i]+ " "+parts[i+1] etc..
}
}
当然,你需要检查i-2是否是一个有效的索引,如果你有大量的数据,使用StringBuffer会很方便。
答案 3 :(得分:1)
// Convert sentence to ArrayList
String[] stringArray = sentence.split(" ");
List<String> stringList = Arrays.asList(stringArray);
// Which word should be matched?
String toMatch = "I";
// How much words before and after do you want?
int before = 2;
int after = 2;
for (int i = 0; i < stringList.size(); ++i) {
if (toMatch.equals(stringList.get(i))) {
int index = i;
if (0 <= index - before && index + after <= stringList.size()) {
StringBuilder sb = new StringBuilder();
for (int i = index - before; i <= index + after; ++i) {
sb.append(stringList.get(i));
sb.append(" ");
}
String result = sb.toString().trim();
//Do something with result
}
}
}
这会在比赛前后提取两个单词。可以扩展为打印最多之前和之后的两个单词,而不是完全两个单词。
编辑该死的......慢慢的,没有花哨的三元运营商的方式:/
答案 4 :(得分:0)
public static void main(String[] args) {
String str = "Hello my name is John and I like to go fishing and hiking I have two sisters and one brother.";
String find = "I";
int countWords = 3;
List<String> strings = countWordsBeforeAndAfter(str, find, countWords);
strings.stream().forEach(System.out::println);
}
public static List<String> countWordsBeforeAndAfter(String paragraph, String search, int countWordsBeforeAndAfter){
List<String> searchList = new ArrayList<>();
String str = paragraph;
String find = search;
int countWords = countWordsBeforeAndAfter;
String[] sp = str.split(" +"); // "+" for multiple spaces
for (int i = 0; i < sp.length; i++) {
if (sp[i].equals(find)) {
String before = "";
for (int j = countWords; j > 0; j--) {
if(i-j >= 0) before += sp[i-j]+" ";
}
String after = "";
for (int j = 1; j <= countWords; j++) {
if(i+j < sp.length) after += " " + sp[i+j];
}
String searhResult = before + find + after;
searchList.add(searhResult);
}
}
return searchList;
}