我需要一个正则表达式,它会给我一个XML标签,例如<ABC/>
或<ABC></ABC>
所以,如果我使用<(.)+?>
,它会给我<ABC>
或<ABC>
或</ABC>
。这很好。
现在,问题是:
我有一个XML
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>
此处,如果您看到,PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y"
在属性值中的符号大于。
所以,正则表达式让我回头
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >
而不是完整的
<VALUE ABC="10000" PQR="12422700" ADJ="" PROD_TYPE="COCOG EFI LWL P&C >1Y-5Y" SRC="BASE" DATA="data" ACTION="INSERT" ID="100000" GRC_PROD=""/>
我需要一些正则表达式,它不会考虑小于和大于作为值的一部分的符号,即用双引号括起来。
答案 0 :(得分:1)
你可以试试这个:
(?i)<[a-z][\w:-]+(?: [a-z][\w:-]+="[^"]*")*/?>
解释如下:
(?i) # Match the remainder of the regex with the options: case insensitive (i)
< # Match the character “<” literally
[a-z] # Match a single character in the range between “a” and “z”
[\\w:-] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “:”
# The character “-”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
(?: # Match the regular expression below
\\ # Match the character “ ” literally
[a-z] # Match a single character in the range between “a” and “z”
[\\w:-] # Match a single character present in the list below
# A word character (letters, digits, and underscores)
# The character “:”
# The character “-”
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
=\" # Match the characters “=\"” literally
[^\"] # Match any character that is NOT a “\"”
* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\" # Match the character “\"” literally
)* # Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
/ # Match the character “/” literally
? # Between zero and one times, as many times as possible, giving back as needed (greedy)
> # Match the character “>” literally
如果您想要加入open
,close
或self-closed
代码,请尝试以下RegEx
:
(?i)(?:<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*>.+?</\1>|<([a-z][\w:-]+)(?: [a-z][\w:-]+="[^"]*")*/>)
实现相同的java
代码片段:
try {
boolean foundMatch = subjectString.matches("(?i)(?:<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*>.+?</\\1>|<([a-z][\\w:-]+)(?: [a-z][\\w:-]+=\"[^\"]*\")*/>)");
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}
希望这会有所帮助......
答案 1 :(得分:1)
要扩展G_H链接点:Don't use regex to parse XML.使用XPath返回节点,并将该节点传递给标识Transformer:
Node valueElement = (Node)
XPathFactory.newInstance().newXPath().evaluate("//VALUE",
new InputSource(new StringReader(xmlDocument)),
XPathConstants.NODE);
StringWriter result = new StringWriter();
TransformerFactory.newInstance().newTransformer().transform(
new DOMSource(valueElement), new StreamResult(result));
String valueElementMarkup = result.toString();
答案 2 :(得分:0)
也试试这个:
<.*?(".*?".*?)*?>
只有存在偶数个<
双引号时,它才会抓取>
和"
之间的所有内容。成对的双引号表示包含的内容。否则它会跳过>
符号并继续搜索下一个>
(这应该在关闭"
引用后发生)