提取包含特定单词的字符串

时间:2017-01-08 08:31:24

标签: java regex

此代码可以提取包含特定单词的句子。问题是如果我想根据不同的单词提取几个句子,我必须多次复制它。有几种方法可以做到这一点吗?可能会给它喂一个阵列?

String o = "Trying to extract this string. And also the one next to it.";       
String[] sent = o.split("\\.");
List<String> output = new ArrayList<String>();
for (String sentence : sent) {
    if (sentence.contains("this")) {
        output.add(sentence);
    }
}       
System.out.println(">>output=" + output);

5 个答案:

答案 0 :(得分:0)

你可以试试这个:

String o = "Trying to extract this string. And also the one next to it.";
String[] sent = o.split("\\.");
List<String> keyList = new ArrayList<String>();
keyList.add("this");
keyList.add("these");
keyList.add("that");

List<String> output = new ArrayList<String>();

for (String sentence : sent) {
    for (String key : keyList) {
        if (sentence.contains(key)) {
            output.add(sentence);
            break;
        }
    }
}
System.out.println(">>output=" + output);

答案 1 :(得分:0)

String sentence = "First String. Second Int. Third String. Fourth Array. Fifth Double. Sixth Boolean. Seventh String";
List<String> output = new ArrayList<String>();

for(String each: sentence.split("\\.")){
    if(inKeyword(each)) output.add(each);
}

System.out.println(output);

助手功能:

public static Boolean inKeyword(String currentSentence){
    String[] keyword = {"int", "double"};

    for(String each: keyword){
        if(currentSentence.toLowerCase().contains(each)) return true;
    }

    return false;
}

答案 2 :(得分:0)

如果你有一个过滤被叫filter的单词列表和一系列句子,你可以使用Collections.disjoint比较该句子的单词是否与要过滤的单词重叠。遗憾的是,如果您过滤"However"并且您的句子包含"However,",则此功能无效。

Collection<String> filter = /**/;
String[] sentences = /**/;
List<String> result = new ArrayList();
for(String sentence : sentences) {
    Collection<String> words = Arrays.asList(sentence.split(" "));
    // If they do not not overlap, they overlap
    if (!Collections.disjoint(words, filter)) {
        result.add(sentence);
    }        
}

答案 3 :(得分:0)

使用流(分成句子和单词):

    String o = "Trying to extract this string. And also the one next to it.";
    Set<String> words = new HashSet<>(Arrays.asList("this", "also"));

    List<String> output = Arrays.stream(o.split("\\.")).filter(
            sentence -> Arrays.stream(sentence.split("\\s")).anyMatch(
                    word -> words.contains(word)
            )
    ).collect(Collectors.toList());

    System.out.println(">>output=" + output);

答案 4 :(得分:0)

您可以按如下方式使用String.matches

String sentence = ...;
if (sentence.matches(".*(you|can|use).*")) { // Or:
if (sentence.matches(".*\\b(you|can|use)\\b.*")) { // With word boundaries

if (sentence.matches("(?i).*(you|can|use).*")) { // Case insensitive ("You")

在java 8中,可能会出现以下变化:

String pattern = ".*(you|can|use).*";

String pattern = new StringJoiner("|", ".*(", ").*)
    .add("you")
    .add("can")
    .add("use")
    .toString();
// Or a stream on the words with a joining collector

Arrays.stream(o.split("\\.\\s*"))
    filter(sentence -> sentence.matches(pattern))
    forEach(System.out::println);