Java正则表达式提取两个单词之间的字符串

时间:2013-06-03 17:20:02

标签: java regex

我有一个看起来像这样的字符串

<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/>

我想提取“远程操作系统:”并获取“Microsoft Windows Server 2008 R2 Enterprise Service Pack 1”。

Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>

所以我使用

制作了一个正则表达式
Pattern pattern = Pattern.compile("(?<=\\bRemote operating system :\\b).*?(?=\\b<br/>\\b)");

但我的正则表达似乎不起作用。任何的想法?这也是提取这个操作系统字符串的好方法,或者我应该采取另一种方式吗?谢谢!

4 个答案:

答案 0 :(得分:2)

尝试此模式:".*Remote operating system : (.*?)<br/>"

public static void main(String[] args) throws Exception {
    String s = "<br/><description>Using a combination of remote probes, (TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess the name of the remote operating system in use, and sometimes its version.</description><br/><fname>os_fingerprint.nasl</fname><br/><plugin_modification_date>2012/12/01</plugin_modification_date><br/><plugin_name>OS Identification</plugin_name><br/><plugin_publication_date>2003/12/09</plugin_publication_date><br/><plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor><br/><solution>n/a</solution><br/><synopsis>It is possible to guess the remote operating system.</synopsis><br/><plugin_output><br/>Remote operating system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1</plugin_output><br/>";

    Pattern pattern = Pattern.compile(".*Remote operating system : (.*?)<br/>");
    Matcher m = pattern.matcher(s);
    if (m.find()) {
      System.out.println(m.group(1));
    }
    else System.out.println("Not found");
}

答案 1 :(得分:0)

正则表达式中:之后和\\b之前没有空格。

尝试这种方式:

Pattern.compile("(?<=\\bRemote operating system : \\b).*?(?=\\b<br/>\\b)");
//                                               ^additional space

如果没有该空格\\b将不匹配新单词的开头(Microsoft)(它也永远不会匹配单词的结尾,因为:无法结束正确的单词)。

答案 2 :(得分:0)

String test = 
        "<br/><description>Using a combination of remote probes, " +
        "(TCP/IP, SMB, HTTP, NTP, SNMP, etc...) it is possible to guess " +
        "the name of the remote operating system in use, and sometimes " +
        "its version.</description><br/><fname>os_fingerprint.nasl</fname>" +
        "<br/><plugin_modification_date>2012/12/01</plugin_modification_date>" +
        "<br/><plugin_name>OS Identification</plugin_name><br/>" +
        "<plugin_publication_date>2003/12/09</plugin_publication_date><br/>" +
        "<plugin_type>combined</plugin_type><br/><risk_factor>None</risk_factor>" +
        "<br/><solution>n/a</solution><br/><synopsis>It is possible to guess the " +
        "remote operating system.</synopsis><br/><plugin_output><br/>Remote operating " +
        "system : Microsoft Windows Server 2008 R2 Enterprise Service Pack 1<br/>" +
        "Confidence Level : 99<br/>Method : MSRPC<br/><br/> <br/>The remote host is " +
        "running Microsoft Windows Server 2008 R2 Enterprise Service Pack 1" +
        "</plugin_output><br/>";
        Pattern pattern = Pattern.compile("Remote\\soperating\\ssystem\\s:\\s(.+?)\\<br/>");
        Matcher matcher = pattern.matcher(test);
        if (matcher.find()) {
            System.out.println(matcher.group(1));
        }

输出:

Microsoft Windows Server 2008 R2 Enterprise Service Pack 1

请注意,通常不建议对标记语言使用正则表达式。 但是在这里你使用正则表达式来对付特定的文本字符串,这恰好只是在标记内部,所以我猜它没问题。

答案 3 :(得分:0)

尝试下一个:

if (str.matches("^.*Remote operating system : ([^<]*).*$")) {
    System.out.println(
        str.replaceAll("^.*Remote operating system : ([^<]*).*$", "$1")
    );
}