仅过滤大型txt文件中的可读字符

时间:2015-09-02 12:12:14

标签: java regex

我有一个包含大量可读和不可读字符的大型.txt文件。我正在尝试创建一个Java程序,它可以创建一个新的.txt文件,只包含前一个.txt文件中的可读字符。请帮我这样做。任何代码将非常感激。我是Java的新手。

1 个答案:

答案 0 :(得分:0)

如果可读,则表示从'a'到'z',从'1'到'9'的所有字符 然后你可以用正则表达式过滤掉那些:

public static String removeSpecialCharacters(String sentence) {
    //StringBuilder container to store all the data in
    StringBuilder stringB = new StringBuilder();
    //loop trough all the characters from the sentence
    for (char c : sentence.toCharArray()) {
        //only store characters that are equal to the below values
        if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == ' ' ) {
            stringB.append(c);
        }
    }
    return stringB.toString().toLowerCase();
}

您可以使用返回类型(String)附加到新的.txt容器中。可以说,对于通过removeSpecialCharacters()方法从旧文件循环读取的每一行,并使用返回值并将其附加到新的.txt文件中。

如果我们读取读/写文件上的standard java doc,我们可以编译以下代码:

import static java.nio.file.StandardOpenOption.*;
import java.nio.file.*;
import java.io.*;


public class Main {

    public static void main(String[] args) {

        readFromFile();

    }

    private static void writeToFile(String line) {
        // Convert the string to a
        // byte array.
        byte data[] = removeSpecialCharacters(line).getBytes();
        Path p = Paths.get("/home/user/Desktop/outFile.txt");

        try (OutputStream out = new BufferedOutputStream(Files.newOutputStream(p, CREATE, APPEND))) {
            out.write(data, 0, data.length);
        } catch (IOException x) {
            System.err.println(x);
        }    
    }

    private static void readFromFile() {
        Path file = Paths.get("/home/user/Desktop/inFile.txt");
        try (InputStream in = Files.newInputStream(file);
            BufferedReader reader =
              new BufferedReader(new InputStreamReader(in))) {
            String line = null;
            while ((line = reader.readLine()) != null) {
                writeToFile(line +"\n");
            }
        } catch (IOException x) {
            System.err.println(x);
        }
    }

    public static String removeSpecialCharacters(String sentence) {
        //StringBuilder container to store all the data in
        StringBuilder stringB = new StringBuilder();
        //loop trough all the characters from the sentence
        for (char c : sentence.toCharArray()) {
            //only store characters that are equal to the below values
            if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == ' ' ) {
                stringB.append(c);
            }
        }
        return stringB.toString().toLowerCase();
    }
}