Question

我对Java比较陌生，我需要一些帮助才能从字符串中提取多个子字符串。字符串的示例如下所示：

String = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/."

期望的结果：WRB MD PRP VB DT NN IN NNS POS JJ NNS

我有一个文本文件，可能有数千条类似POS标记的行，我需要从中提取POS标记并根据POS标记进行一些计算。

我尝试过使用tokenizer但是没有真正得到我想要的结果。我甚至尝试使用split()并保存到数组，因为我需要存储它并在以后使用它，但仍然无效。

最后，我尝试使用模式匹配器，我遇到了正则表达式的问题，因为它使用正斜杠返回单词。

Regex: [\/](.*?)\s\b
Result: /WRB /MD ....

如果有更好的方法，请告诉我，或者是否有人可以帮我弄清楚我的正则表达式有什么问题。

Answer 1

这应该有效：

String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
System.out.println(string.replaceAll("[^/]+/([^ ]+ ?)", "$1"));

打印：WRB MD PRP VB DT NN IN NNS POS JJ NNS .

Answer 2

如果您仍想使用模式匹配，请查看positive lookbehinds。它允许您匹配以斜杠开头的单词，但实际上不匹配斜杠本身。

一个例子是这样的：

(?<=/).+?(?= |$)

匹配以斜杠开头的任何内容，后跟空格或字符串的结尾

这是一个用Java编写的工作示例：

import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.LinkedList;

public class SO {
    public static void main(String[] args) {
        String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";
        Pattern pattern = Pattern.compile("(?<=/).+?(?= |$)");
        Matcher matcher = pattern.matcher(string);

        LinkedList<String> list = new LinkedList<String>();

        // Loop through and find all matches and store them into the List
        while(matcher.find()) { 
            list.add(matcher.group()); 
        }

        // Print out the contents of this List
        for(String match : list) { 
            System.out.println(match); 
        }
    }
}

Answer 3

String string = "How/WRB can/MD I/PRP find/VB a/DT list/NN of/IN celebrities/NNS '/POS real/JJ names/NNS ?/.";

string = string .replaceAll("\\S+/", "").replace(".", "");  

System.out.println(string );

Answer 4

str = str.repalceAll("\\S+/", "")怎么样？它将替换删除非空白字符后跟斜杠。

Java：如何从字符串中提取两个字符之间的子字符串？

4 个答案: