使用Pattern.compile解析一行

时间:2012-03-23 22:12:49

标签: java xml parsing design-patterns matcher

我正在尝试解析以下行,在Java中使用myline并且它不断抛出空值。

这是我试图获得'000000010'。

myline = "<status> <id>000000010</id> <created_at>2012/03/11</created_at> <text>@joerogan Played as Joe Savage Rogan in Undisputed3 Career mode, won Pride GP, got UFC title shot against Shields, lost 3 times, and retired</text> <retweet_count>0</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user></status>"
p = Pattern.compile("(?i)<id.*?>(.+?)</id>", Pattern.DOTALL);
m = regex.matcher(myline);
id =m.group(1);

有什么建议吗?

4 个答案:

答案 0 :(得分:3)

强烈建议使用XML解析器。 Java中内置了一个,这是针对您的问题的示例解决方案。为简单起见省略了异常处理程序。

DocumentBuilderFactory factory = DocumentBuilderFactory
        .newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
String input = "<status> <id>000000010</id> <created_at>2012/03/11</created_at> <text>@joerogan Played as Joe Savage Rogan in Undisputed3 Career mode, won Pride GP, got UFC title shot against Shields, lost 3 times, and retired</text> <retweet_count>0</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user></status>";
Document document = builder.parse(new InputSource(new StringReader(
        input)));
String value = document.getElementsByTagName("id").item(0)
        .getTextContent();
System.out.println(value);

答案 1 :(得分:2)

您不应该首先使用正则表达式来解析XML。

但除此之外,你没有正确使用正则表达式。仅仅实例化一个matcher对象是不够的,你还需要告诉它做一些事情:

if (m.find())
{
    id = m.group(1);
}

答案 2 :(得分:0)

本网站可能会为您提供有关使用Java解析XML的一些信息 - http://www.java-samples.com/showtutorial.php?tutorialid=152

答案 3 :(得分:0)

这是有效的

String myline = "<status> <id>000000010</id> <created_at>2012/03/11</created_at> <text>@joerogan Played as Joe Savage Rogan in Undisputed3 Career mode, won Pride GP, got UFC title shot against Shields, lost 3 times, and retired</text> <retweet_count>0</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user></status>";
Pattern p = Pattern.compile(".*<id>(.+)</id>.*");
Matcher m = p.matcher(myline);
if (m.matches()) {
    String id = m.group(1);
    System.out.println(id);
}

[ 编辑 :]这也有效,而且效果更好:

String myline = "<status> <id>000000010</id> <created_at>2012/03/11</created_at> <text>@joerogan Played as Joe Savage Rogan in Undisputed3 Career mode, won Pride GP, got UFC title shot against Shields, lost 3 times, and retired</text> <retweet_count>0</retweet_count> <user> <name>Siggi Eggertsson</name> <location>Berlin, Germany</location> <description></description> <url>http://www.siggieggertsson.com</url> </user></status>";
Pattern p = Pattern.compile("<id>(.+)</id>");
Matcher m = p.matcher(myline);
if (m.find()) {
    String id = m.group(1);
    System.out.println(id);
}