我有一系列用String表示的重复XML标签:
<Field name="foo" date="20170501">
<Value type="foo">someVal</Value>
</Field>
<Field name="foo" date="20170501">
<Value type="foo">someVal</Value>
</Field>
我正在尝试使用正则表达式(JAVA)从字段中提取名称属性,并在“值”节点中提取实际值。使用正则表达式可以吗?
我有以下接近的正则表达式,但它并不止于第一个结尾的</Field>
标记
\\<Field([^\\>]*)\\>(.+)\\</Field\\>
答案 0 :(得分:2)
如前所述,正则表达式不适合此任务,因为它的可读性和效率较低。但是无论如何...
field.xml:
<?xml version="1.0" encoding="UTF-8"?>
<Fields>
<Field name="foo 1" date="20170501">
<Value type="foo">someVal 1</Value>
</Field>
<Field name="foo 2" date="20170501">
<Value type="foo">someVal 2</Value>
</Field>
</Fields>
解决方案1:正则表达式(丑陋但有趣的方式... )
try {
byte[] encoded = Files.readAllBytes(Paths.get("path/to/fields/xml/file.xml"));
String content = new String(encoded, StandardCharsets.UTF_8);
Pattern pattern = Pattern.compile("<field[\\s\\S]*?name=\"(?<gName>[\\s\\S]*?)\"[\\s\\S]*?>[\\s\\S]*?<value\\b[\\s\\S]*?>(?<gVal>[\\s\\S]*?)</value>[\\s\\S]*?</field>", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE );
Matcher matcher = pattern.matcher(content);
// while loop for each <Field> entry
while(matcher.find()) {
matcher.group("gName"); // named group 'gName' contains the value of name attribute
matcher.group("gVal"); // named group 'gVal' contains the text content of the value tag
}
} catch (IOException e) {
e.printStackTrace();
}
解决方案2:XPath
(正确但无聊的方式... )
字段类别:
public class Field {
private String name;
private String value;
// ... getter & setters ...
@Override
public String toString() {
return String.format("Field { name: %s, value: %s }", this.name, this.value);
}
}
无聊的班级:
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
public class Boring {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
doc = builder.parse("path/to/fields/xml/file.xml");
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
List<Field> fields = getFields(doc, xpath);
for (Field f : fields) {
System.out.println(f);
}
} catch (Exception e) {
e.printStackTrace();
}
}
private static List<Field> getFields(Document doc, XPath xpath) {
List<Field> list = new ArrayList<>();
try {
XPathExpression expr = xpath.compile("/Fields/*");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++) {
Node fieldNode = nodes.item(i);
NodeList fieldNodeChildNodes = fieldNode.getChildNodes();
Field field = new Field();
// set name
field.setName(fieldNode.getAttributes().getNamedItem("name").getNodeValue());
for (int j = 0; j < fieldNodeChildNodes.getLength(); j++) {
if (fieldNodeChildNodes.item(j).getNodeName() == "Value") {
// set value
field.setValue(fieldNodeChildNodes.item(j).getTextContent());
break;
}
}
list.add(field);
}
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return list;
}
}
输出:
Field { name: foo 1, value: someVal 1 }
Field { name: foo 2, value: someVal 2 }
答案 1 :(得分:1)
在这里使用正则表达式这样做可能不是最好的主意。但是,如果您愿意,我们可以尝试添加可选的捕获组并收集所需的数据:
<field name="(.+?)"(.+\s*)?<value.+?>(.+?)<\/value>(\s*)?<\/field>
我们可以在此处使用i
标志。
import java.util.regex.Matcher;
import java.util.regex.Pattern;
final String regex = "<field name=\"(.+?)\"(.+\\s*)?<value.+?>(.+?)<\\/value>(\\s*)?<\\/field>";
final String string = "<Field name=\"foo\" date=\"20170501\">\n"
+ " <Value type=\"foo\">someVal</Value>\n"
+ "</Field>\n"
+ "<Field name=\"foo\" date=\"20170501\">\n"
+ " <Value type=\"foo\">someVal</Value>\n"
+ "</Field>\n"
+ "<Field name=\"foo\" date=\"20170501\"><Value type=\"foo\">someVal</Value></Field>\n";
final String subst = "\\1: \\3";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE | Pattern.CASE_INSENSITIVE);
final Matcher matcher = pattern.matcher(string);
// The substituted value will be contained in the result variable
final String result = matcher.replaceAll(subst);
System.out.println("Substitution result: " + result);
此代码段只是为了说明捕获组的工作方式:
const regex = /<field name="(.+?)"(.+\s*)?<value.+?>(.+?)<\/value>(\s*)?<\/field>/gmi;
const str = `<Field name="foo" date="20170501">
<Value type="foo">someVal</Value>
</Field>
<Field name="foo" date="20170501">
<Value type="foo">someVal</Value>
</Field>
<Field name="foo" date="20170501"><Value type="foo">someVal</Value></Field>
`;
const subst = `$1: $3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);
如果不需要此表达式,可以在regex101.com中对其进行修改或更改。
jex.im还有助于可视化表达式。
答案 2 :(得分:0)