如何替换字符串中的多个子字符串,仅解析源字符串一次

时间:2018-02-18 14:43:17

标签: java regex xml pattern-matching

我正在尝试以最快最有效的方式替换XML字符串中的元素。考虑一下代码:

final String rawXml = "
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<cnpOnlineRequest merchantId=\"017872345\" merchantSdk=\"Java;12.0.0\" version=\"12.0\" xmlns=\"http://www.vantivcnp.com/schema\">
    <authentication>
        <user>AUSER</user>
        <password>pa5Sw0rd!</password>
    </authentication>
    <authorization reportGroup=\"Default Report Group\" id=\"87654321\">
        <orderId>Merchant Order Id</orderId>
        <amount>1299</amount>
        <orderSource>ecommerce</orderSource>
        <billToAddress>
            <addressLine1>5 Some Road</addressLine1>
            <city>Townsville</city>
            <state>Alabama</state>
            <zip>31431</zip>
            <country>US</country>
        </billToAddress>
        <card>
            <type>VI</type>
            <number>1234123412341234</number>
            <expDate>0718</expDate>
            <cardValidationNum>999</cardValidationNum>
            <pin>1234</pin>
        </card>
    </authorization>
</cnpOnlineRequest>";

final String expectedXml = "
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>
<cnpOnlineRequest merchantId=\"017872345\" merchantSdk=\"Java;12.0.0\" version=\"12.0\" xmlns=\"http://www.vantivcnp.com/schema\">
    <authentication>
        <user>AUSER</user>
        <password>---sanitised---</password>
    </authentication>
    <authorization reportGroup=\"Default Report Group\" id=\"87654321\">
        <orderId>Merchant Order Id</orderId>
        <amount>1299</amount>
        <orderSource>ecommerce</orderSource>
        <billToAddress>
            <addressLine1>5 Some Road</addressLine1>
            <city>Townsville</city>
            <state>Alabama</state>
            <zip>31431</zip>
            <country>US</country>
        </billToAddress>
        <card>
            <type>VI</type>
            <number>---sanitised---</number>
            <expDate>0718</expDate>
            <cardValidationNum>---sanitised---</cardValidationNum>
            <pin>---sanitised---</pin>
        </card>
    </authorization>
</cnpOnlineRequest>";

final String[] elements = { "password", "number", "cardValidationNum", "pin" };
final Map<String,String> replacements = new LinkedHashMap<>();
for (final String element : elements) {
    final String regexp = String.format("<%s>.*</%s>", element, element);
    final String replacement = String.format("<%s>---sanitised---</%s>", element, element);
    replacements.put(regexp, replacement);
}
final String regexp = "%(" + StringUtils.join(replacements.ketSet(), "|") + ")%";

final Pattern pattern = Pattern.compile(regexp, Pattern.DOTALL);
final Matcher matcher = pattern.matcher(rawXml);
final StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
    matcher.appendReplacement(buffer, replacements.get(matcher.group(1)));
}
final String sanitisedXml = buffer.toString();

assertThat(sanitisedXml, equalTo(expectedXml));

发生的事情是find()没有找到任何内容,因此缓冲区为空,并且断言失败。我也试过更换&#34;%(&#34;和&#34;)%&#34;使用&#34;。*(&#34;和&#34;)。*&#34;,然后find()将起作用,但只有一个组,它包含整个字符串。

澄清:这必须很快,并且元素多于列出的元素。我想只解析一次字符串,所以每个正则表达式和替换的replaceAll都不是一个选项。既不是将XML解组为对象,也不是使用代码替换所有值,然后将对象编组回XML。

0 个答案:

没有答案