java Matcher.appendReplacement()方法(带有appendTail())可以让我将源文本转换为结果文本,同时替换所有出现的模式。 伪语言中的算法类似于:
call Matcher.region()
while Matcher.find() {
call Matcher.appendReplacement()
}
call Matcher.appendTail()
如果仅在给定区域内搜索模式,则一切正常:
call Matcher.region()
while Matcher.find() {
call Matcher.appendReplacement()
}
call Matcher.region()
while Matcher.find() {
call Matcher.appendReplacement()
}
call Matcher.appendTail()
当在区域内进行匹配后,我想进一步移动该区域时出现问题:
package test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestMatcher {
public static void main(String[] args) throws Exception {
String inputText = "dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5";
System.out.println("input = " + inputText);
StringBuffer result = new StringBuffer();
Pattern pattern = Pattern.compile("dog");
Matcher matcher = pattern.matcher(inputText);
int startPos = inputText.indexOf("start");
int endPos = inputText.indexOf("end");
System.out.println("Setting region to " + startPos + "," + endPos);
matcher.region(startPos, endPos);
while (matcher.find()) {
matcher.appendReplacement(result, "cat");
}
System.out.println("Partial result = " + result);
startPos = inputText.indexOf("start", endPos);
endPos = inputText.indexOf("end", startPos);
System.out.println("Setting region to " + startPos + "," + endPos);
matcher.region(startPos, endPos);
while (matcher.find()) {
matcher.appendReplacement(result, "cat");
}
matcher.appendTail(result);
System.out.println("Final result = " + result);
}
}
这不起作用,因为region()重置匹配器,以便Matcher.appendReplacement()从文本的开头重新开始,导致结果包含源的某些部分的重复。
这是设计发生的,正如javadoc所说。
更换可位于多个区域内的模式的正确方法是什么?
编辑:添加了java示例,删除了文本示例
以下java示例显示来自
之类的输入dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5
你没有得到预期的输出
dog1启动cat2a cat2b结束dog3启动cat4a cat4b结束dog5
input = dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5
Setting region to 5,23
Partial result = dog1 start cat2a cat
Setting region to 32,50
Final result = dog1 start cat2a catdog1 start dog2a dog2b end dog3 start cat4a cat4b end dog5
输出:
{{1}}
答案 0 :(得分:1)
子区域是否必须由单独的匹配器处理?像:
public static void main(String[] args) {
String inputText = "dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5";
System.out.println("Input = " + inputText);
StringBuffer result = new StringBuffer();
Pattern pattern = Pattern.compile("(start(.*?)end)");
Matcher matcher = pattern.matcher(inputText);
while (matcher.find()) {
int s = matcher.start();
int e = matcher.end();
System.out.printf("(%d .. %d) -> \"%s\"\n", s, e, matcher.group(1));
matcher.appendReplacement(result, processSubGroup(matcher.group(1), matcher.group(2)));
}
matcher.appendTail(result);
System.out.println("Final result = " + result);
}
static String processSubGroup(String subGroup, String contents) {
StringBuffer result = new StringBuffer();
Pattern pattern = Pattern.compile("dog");
Matcher matcher = pattern.matcher(subGroup);
while (matcher.find())
matcher.appendReplacement(result, "cat");
matcher.appendTail(result);
return result.toString();
}
或者,没有与日志相关的东西,更简单:
public static void main(String[] args) {
String inputText = "dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5";
StringBuffer result = new StringBuffer();
Pattern pattern = Pattern.compile("(start(.*?)end)");
Matcher matcher = pattern.matcher(inputText);
while (matcher.find())
matcher.appendReplacement(result, processSubGroup(matcher.group(1), matcher.group(2)));
matcher.appendTail(result);
System.out.println("Final result = " + result);
}
static String processSubGroup(String subGroup, String contents) {
return Pattern.compile("dog").matcher(subGroup).replaceAll("cat");
}
结果:
Input = dog1 start dog2a dog2b end dog3 start dog4a dog4b end dog5
(5 .. 26) -> "start dog2a dog2b end"
(32 .. 53) -> "start dog4a dog4b end"
Final result = dog1 start cat2a cat2b end dog3 start cat4a cat4b end dog5
或更抽象的方法:
interface GroupProcessor {
String process(String group);
}
public static void main(String[] args) {
String inputText = "dog1 dogs dog2a dog2b enddogs cow1 dog3 cows cow2a cow2b endcows dog4 dogs dog5a dog5b enddogs cow3";
String result = inputText;
result = processGroup(result, "dogs*enddogs", (group) -> {
return Pattern.compile("dog").matcher(group).replaceAll("cat");
});
result = processGroup(result, "cows*endcows", (group) -> {
return Pattern.compile("cow").matcher(group).replaceAll("sheep");
});
System.out.println("Input = " + inputText);
System.out.println("Final result = " + result);
}
static String processGroup(String input, String regex, GroupProcessor processor) {
StringBuffer result = new StringBuffer();
Pattern pattern = Pattern.compile(String.format("(%s)", regex.replace("*", "(.*?)")));
Matcher matcher = pattern.matcher(input);
while (matcher.find())
matcher.appendReplacement(result, processor.process(matcher.group(1)));
matcher.appendTail(result);
return result.toString();
}
哪会给我们:
Input = dog1 dogs dog2a dog2b enddogs cow1 dog3 cows cow2a cow2b endcows dog4 dogs dog5a dog5b enddogs cow3
Final result = dog1 cats cat2a cat2b endcats cow1 dog3 sheeps sheep2a sheep2b endsheeps dog4 cats cat5a cat5b endcats cow3
<强> UPD。强>
原因,为什么Matcher.region()
重置隐式匹配器状态,因此lastAppendPosition
。
appendReplacement
和appendTail
在某种程度上是一种向前移动的机制,而.region()
则不是那么具有确定性。
假设以下情况:对于100个字符的字符串,您应用了区域0..20,执行了find()
- appendReplacement()
循环,然后将区域移动到fe,30..60,并执行了替换循环试。
现在,StringBuffer
中有0..100源字符串和f.e.,0..60替换结果字符串。
接下来,您将区域10..40应用于源字符串......以及下一步是什么?如果源字符串的那个区域不包含匹配项 - 好的,什么都不做,但是如果 包含匹配项? appendReplacement
应该在哪里附加/插入替换结果?结果字符串已超过10..40区域,appendReplacement
仅追加,而不是替换输出缓冲区中字符串的分区。
如果存在一些约束机制,那个有限区域只设置为MAX(start, lastAppendPosition)..MIN(end, sourceLength)
,那么ok,append机制可以正常工作,但.region()
方法没有这样的限制,或者它们(这些局限性) )会使.region()
方法对搜索毫无用处( 是<{1}}方法的主要目的)。
这就是为什么.region()
重置了隐含的匹配状态,使其与.region()
相关的东西不那么有用。如果您需要不同的行为 - 通过封装扩展appendReplacement()
类。