字符串就像是分配给变量fromFile
:
<!DOCTYPE html>
<!--
To change this license header, choose License Headers in Project Properties.
To change this template file, choose Tools | Templates
and open the template in the editor.
-->
<html>
<head>
<title>TODO supply a title</title>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
</head>
<body>
<div>TODO write content</div>
<span class="test"></span>
<ruby>
text1<rp>(</rp><rt>textA</rt><rp>)</rp>
text2<rp>(</rp><rt>textB</rt><rp>)</rp>
text3<rp>(</rp><rt>textC</rt><rp>)</rp>
</ruby>
<img src="images/aaaaa.jpg">
<img src="./audio/bbbbb.mp3">
<img src="../../audio/ccccc.mp3">
<img class="aaaa">
<input class="bbbb">
<audio controls>
<source src="horse.ogg" type="audio/ogg">
<source src="horse.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>
</body>
</html>
和我的正则表达式是:
final Pattern pattern = Pattern.compile("(<rt>(.+?)</rt>)|(?=(\\b(\\w*\\S)\\b)<rp>)");
final Pattern pattern2 = Pattern.compile("(?=(\\b(\\w*\\S)\\b)<rp>)");
final Matcher matcher = pattern.matcher(fromFile);
final Matcher matcher2 = pattern2.matcher(fromFile);
while(matcher.find()) {
matcher2.find();
fromFile = "<font class=\"ruby\" title=\"" + matcher.group(1) + "\"" + ">" + matcher2.group(1) + "</font>";
break;
}
if((matcher.find()) != true) {
System.out.println(fromFile);
}
我想通过仅使用一个将产生相同输出的正则表达式来实现它。
第一个正则表达式将提取<rt></rt>
内的元素,第二个正则表达式将在标记<rp>
之前获取数据。我分配了它们,找到的提取数据为matcher.group(1)
和matcher2.group(1)
。
答案 0 :(得分:0)
您可以通过逐行解析来编写正则表达式以匹配同一行上的两个字符串
Pattern pattern = Pattern.compile("(\\S+)<rp>.*<rt>(\\S+)<\\/rt>.*");
完整代码
List<String> lines = null;
try (BufferedReader br = new BufferedReader(new FileReader(new File("pathToFile")))) {
lines = br.lines().collect(Collectors.toList()); //File content to List<String>
}
Pattern pattern = Pattern.compile("(\\S+)<rp>.*<rt>(\\S+)<\\/rt>.*");
for (String line : lines) {
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1) + " " + matcher.group(2));
}
}
输出
text1 textA
text2 textB
text3 textC