我有以下文字:
<Data>
<xpath>/Temporary/EIC/SpouseSSNDisqualification</xpath>
<Gist>AllConditionsTrue</Gist>
<Template>
<Text id="1">Your spouse is required to have a Social Security number instead of an ITIN to claim this credit. This is based on the IRS rules for claiming the Earned Income Credit.</Text>
</Template>
</Data>
<Data>
<xpath>/Temporary/EIC/SpouseSSNDisqualification</xpath>
<Gist>AllConditionsTrue</Gist>
<Template>
<Text id="1">Your spouse has the required Social Security number instead of an ITIN to claim this credit. This is based on the IRS rules for claiming the Earned Income Credit.</Text>
</Template>
</Data>
我想在xpath
标签之间提取数据,而不是标签本身。
输出应为:
/Temporary/EIC/SpouseSSNDisqualification
/Temporary/EIC/SpouseSSNDisqualification
这个正则表达式似乎给了我所有的文字,包括我不想要的xpath
标签:
<NodeID>(.+?)<\/NodeID>
修改
这是我的Java代码,但我不确定这是否会增加我的问题的价值:
try {
String xml = FileUtils.readFileToString(file);
Pattern p = Pattern.compile("<xpath>(.+?)<\\/xpath>");
Matcher m = p.matcher(xml);
while(m.find()) {
System.out.println(m.group(0));
}
}
答案 0 :(得分:4)
易。这是因为你取得了所有的结果,而不仅仅是第1组的价值。
String nodestr = "<xpath>/Temporary/EIC/SpouseSSNDisqualification</xpath>";
String regex = "<xpath>(.+?)<\/xpath>";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(nodestr);
if (matcher.matches()) {
String tag_value = matcher.group(1); //taking only group 1
System.out.println(tag_value); //printing only group 1
}
答案 1 :(得分:1)
您可以尝试使用前瞻和后视方法:
Pattern pattern = Pattern.compile("(?<=<xpath>)(.*?)(?=</xpath>)");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
String group = matcher.group();
System.out.println(group);
}
我认为这是一种更清洁的方式。