用正则表达式替换Java中除Unicode字母以外的任何内容

时间:2018-12-17 05:20:29

标签: java regex

我的文本文件采用以下格式,具有不同类型的字符串,如下所示:

candle
(air-paraffin)
1,000
°c
(1,800
°f)
smoldering
cigarette:
temperature
13%,
wildlife.[14]
johnston,
f.
h.;
keeley,
j.
bibcode:2009sci...324..481b
(http://adsabs.harvard.edu/abs/2009sci...3

除了下面的简单单词,我想删除所有内容。

smoldering
temperature

也就是说,如果单词后面连逗号(例如闷烧),我将其删除。

我尝试删除以MyString.replaceAll("^\\d", " ")开头的数字,但即使这样也不起作用。

3 个答案:

答案 0 :(得分:2)

如果将整个文件加载到内存中并带有换行符,则可以使用如下所示的正则表达式:

text = text.replaceAll("(?m)^.*[^a-zA-Z\r\n].*(?:\R|$)", "")

输出

candle
smoldering
temperature

有关演示,请参见regex101

但是,在加载文本文件时最好进行过滤:

Pattern simpleWord = Pattern.compile("\\p{L}+"); // one or more Unicode letters
try (BufferedReader in = Files.newBufferedReader(Paths.get("path/to/file.txt"))) {
    for (String line; (line = in.readLine()) != null; ) {
        if (simpleWord.matcher(line).matches()) {
            // found simple word
        }
    }
}

如果您想要列表中的简单单词,则可以使用Java 8流来简化:

List<String> simpleWords;
try (Stream<String> lines = Files.lines(Paths.get("path/to/file.txt"))) {
    simpleWords = lines.filter(Pattern.compile("^\\p{L}+$").asPredicate())
                       .collect(Collectors.toList());
}

答案 1 :(得分:1)

此解决方案将迭代input.txt行,如果它们与某些正则表达式匹配,则将其粘贴到output.txt中。之后,它将删除output.txt并使用input.txt原始文件重命名。

班级:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.regex.Pattern;

public class ReplaceWithRegex {
    public static void main(String[] args) throws IOException {
        File inputFile = new File("input.txt");
        File outputFile = new File("output.txt");

        try (BufferedReader reader = new BufferedReader(new FileReader(inputFile));
                BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
            String line = null;
            while ((line = reader.readLine()) != null) {
                if (Pattern.matches("^[a-zA-Z]+$", line)) {
                    writer.write(line);
                    writer.newLine();
                }
            }
        }
        if (inputFile.delete()) {
            // Rename the output file to the input file
            if (!outputFile.renameTo(inputFile)) {
                throw new IOException("Could not rename output to input");
            }
        } else {
            throw new IOException("Could not delete original input file ");
        }
    }
}

Input.txt

candle
(air-paraffin)
1,000
°c
(1,800
°f)
smoldering
cigarette:
temperature
13%,
wildlife.[14]
johnston,
f.
h.;
keeley,
j.
bibcode:2009sci...324..481b
(http://adsabs.harvard.edu/abs/2009sci...3

执行后的Input.txt:

candle
smoldering
temperature

答案 2 :(得分:0)

假设行是定界符:

myString.replaceAll("^[^a-z&&[^A-Z]]*$", "");