我的文本文件采用以下格式,具有不同类型的字符串,如下所示:
candle
(air-paraffin)
1,000
°c
(1,800
°f)
smoldering
cigarette:
temperature
13%,
wildlife.[14]
johnston,
f.
h.;
keeley,
j.
bibcode:2009sci...324..481b
(http://adsabs.harvard.edu/abs/2009sci...3
除了下面的简单单词,我想删除所有内容。
smoldering
temperature
也就是说,如果单词后面连逗号(例如闷烧),我将其删除。
我尝试删除以MyString.replaceAll("^\\d", " ")
开头的数字,但即使这样也不起作用。
答案 0 :(得分:2)
如果将整个文件加载到内存中并带有换行符,则可以使用如下所示的正则表达式:
text = text.replaceAll("(?m)^.*[^a-zA-Z\r\n].*(?:\R|$)", "")
输出
candle
smoldering
temperature
有关演示,请参见regex101。
但是,在加载文本文件时最好进行过滤:
Pattern simpleWord = Pattern.compile("\\p{L}+"); // one or more Unicode letters
try (BufferedReader in = Files.newBufferedReader(Paths.get("path/to/file.txt"))) {
for (String line; (line = in.readLine()) != null; ) {
if (simpleWord.matcher(line).matches()) {
// found simple word
}
}
}
如果您想要列表中的简单单词,则可以使用Java 8流来简化:
List<String> simpleWords;
try (Stream<String> lines = Files.lines(Paths.get("path/to/file.txt"))) {
simpleWords = lines.filter(Pattern.compile("^\\p{L}+$").asPredicate())
.collect(Collectors.toList());
}
答案 1 :(得分:1)
此解决方案将迭代input.txt行,如果它们与某些正则表达式匹配,则将其粘贴到output.txt中。之后,它将删除output.txt并使用input.txt原始文件重命名。
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import java.util.regex.Pattern;
public class ReplaceWithRegex {
public static void main(String[] args) throws IOException {
File inputFile = new File("input.txt");
File outputFile = new File("output.txt");
try (BufferedReader reader = new BufferedReader(new FileReader(inputFile));
BufferedWriter writer = new BufferedWriter(new FileWriter(outputFile))) {
String line = null;
while ((line = reader.readLine()) != null) {
if (Pattern.matches("^[a-zA-Z]+$", line)) {
writer.write(line);
writer.newLine();
}
}
}
if (inputFile.delete()) {
// Rename the output file to the input file
if (!outputFile.renameTo(inputFile)) {
throw new IOException("Could not rename output to input");
}
} else {
throw new IOException("Could not delete original input file ");
}
}
}
candle
(air-paraffin)
1,000
°c
(1,800
°f)
smoldering
cigarette:
temperature
13%,
wildlife.[14]
johnston,
f.
h.;
keeley,
j.
bibcode:2009sci...324..481b
(http://adsabs.harvard.edu/abs/2009sci...3
candle
smoldering
temperature
答案 2 :(得分:0)
假设行是定界符:
myString.replaceAll("^[^a-z&&[^A-Z]]*$", "");