Java正则表达式,使用模式在文本之间替换

时间:2019-05-30 15:45:58

标签: java regex string

我是Java正则表达式的新手。我有一个长字符串,其中包含这样的文本(下面只是我要替换的字符串的一部分):

href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)"
href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)"

我想替换

Images

http://google.com/Images

例如我的输出应如下所示:

href="javascript:openWin('http://google.com/Images/DCRMBex_01B_ex01.jpg',480,640)"

下面是我的Java程序:

import java.io.FileReader;
import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Main2 {

    public static void main(String[] args) throws FileNotFoundException {

        Scanner in = new Scanner(new FileReader("C:\\Projects\\input.txt"));

        StringBuilder sb = new StringBuilder();
        while (in.hasNext()) {
            sb.append(in.next());
        }
        String patternString = "href=\"javascript:openWin(.+?)\"";
        Pattern pattern = Pattern.compile(patternString);
        Matcher matcher = pattern.matcher(sb);
        while (matcher.find()) {
            //System.out.println(matcher.group(1));
            //System.out.println(matcher.group(1).replaceAll("Images", "http://google.com/Images"));
            matcher.group(1).replaceAll("Images", "http://google.com/Images");

        }
        System.out.println(sb);
    }
}

下面是我的输入文件(input.txt)。这只是我文件的一部分。该文件太长,无法粘贴到此处:

 <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_01_ex01.pdf"><b>Example 1: Bible (Rusch)</b></a> � <a href="javascript:openWin('Images/DCRMBex_01A_ex01.jpg',480,640)">Figure 1A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_01B_ex01.jpg',480,640)">Figure 1B. Source of supplied title</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_06_ex06.pdf"><b>Example 6: Angelo Carletti</b></a> � <a href="javascript:openWin('Images/DCRMBex_06A_ex06.jpg',480,640)">Figure 6A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_06B_ex06.jpg',480,640)">Figure 6B. Colophon showing use of i/j and u/v</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_02_ex02.pdf"><b>Example 2: Greek anthology</b></a> � <a href="javascript:openWin('Images/DCRMBex_02A_ex02.jpg',480,640)">Figure 2A. First page of text</a> � <a href="javascript:openWin('Images/DCRMBex_02B_ex02.jpg',480,640)">Figure 2B. Colophon</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_07_ex07.pdf"><b>Example 7: Erasmus</b></a> � <a href="javascript:openWin('Images/DCRMBex_07A_ex07.jpg',480,640)">Figure 7A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_07B_ex07.jpg',480,640)">Figure 7B. Colophon</a> � <a href="javascript:openWin('Images/DCRMBex_07C_ex07.jpg',640,480)">Figure 7C. Running title</a></td>
                          </tr>
                          <tr>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_03_ex03.pdf"><b>Example 3: Heytesbury</b></a> � <a href="javascript:openWin('Images/DCRMBex_03A_ex03.jpg',480,640)">Figure 3A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_03B_ex03.jpg',480,640)">Figure 3B. Colophon showing use of i/j and u/v</a></td>
                            <td valign="top">  </td>
                            <td valign="top"><a href="http://www.google.com/cds/desktop/documents/DCRMBex/DCRMBex_08_ex08.pdf"><b>Example 8: Pliny</b></a> � <a href="javascript:openWin('Images/DCRMBex_08A_ex08.jpg',480,640)">Figure 8A. Title page</a> � <a href="javascript:openWin('Images/DCRMBex_08B_ex08.jpg',480,640)">Figure 8B. Colophon</a></td>

输出:

1)System.out.println(matcher.group(1))

('Images/DCRMBex_05_ex05.jpg',480,640)

2)System.out.println(matcher.group(1).replaceAll(“ Images”,“ http://google.com/Images”)));

 ('http://google.com/Images/DCRMBex_05_ex05.jpg',480,640)

但是当我打印我的struingbuilder时,它没有显示任何替换。我在这里做错了什么?任何帮助表示赞赏。谢谢

2 个答案:

答案 0 :(得分:2)

replaceAll返回修改后的字符串;它不会修改到位。在这种情况下,我不会使用java.util.regex,而是使用replaceAllcapture groups的支持:

Scanner in = new Scanner(new FileReader("C:\\Projects\\input.txt"));
StringBuilder sb = new StringBuilder();
while (in.hasNext()) {
    sb.append(in.next());
}
// Modified regex 
String patternString = "(href=\"javascript:openWin\\(')(.+?)(')";

String result = sb.toString().replaceAll(patternString, "$1http://google.com/$2$3");

Try it online

希望这会有所帮助!

答案 1 :(得分:2)

我建议使用Files.lines()和Java Steam来修改输入。使用您的实际输入,您也不需要正则表达式:

try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(line -> line.replace("Images", "http://google.com/Images"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

如果您真的想使用正则表达式,建议您在循环外使用模式,因为String.replaceAll()每次调用时都会在内部编译该模式。因此,如果不对每行执行Pattern.compile(),则性能会更好:

Pattern pattern = Pattern.compile("(href=\"javascript:openWin.*)(Images.*\")");
try (Stream<String> lines = Files.lines(Paths.get("input.txt"))) {
    String result = lines
            .map(pattern::matcher)
            .map(matcher -> matcher.replaceAll("$1http://google.com/$2"))
            .collect(Collectors.joining("\n"));
    System.out.println(result);
}

使用此正则表达式进行替换,它将创建两个组(在()之间)。您可以使用$index在替换字符串中使用此组。因此$1将插入第一组。

两种情况下的结果均为:

href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01B_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_01A_ex01.jpg&amp;#39;,480,640)"
href="javascript:openWin(&amp;#39;http://google.com/Images/DCRMBex_06A_ex06.jpg&amp;#39;,480,640)"