如何将多行文件转换为用控制字符分隔的单行文件

时间:2014-07-22 17:39:12

标签: java regex eclipse parsing stringbuilder

我正在尝试解析这样的文件:

TextFile.txt的

_=1406048396605
bh=1244
bw=1711
c=24
c19=DashboardScreen
c2=2014-07-22T10:00:00-0700
c4=64144090210294
c40=3#undefined#0#0#a=-2#512#-1#0
c41=14060470498427c3e4ed
c46=Green|firefox|Firefox|30|macx|Mac OS X
c5=NONFFA
c6=HGKhjgj
c7=OFF_SEASON|h:PARTIAL|
ch=YHtgsfT
g=https://google.hello.com
h5=77dbf90c-5794-4a40-b1ab-fe1c82440c68-1406048401346
k=true
p=Shockwave Flash;QuickTime Plug-in 7.7.3;Default Browser Helper;SharePoint Browser Plug-in;Java Applet Plug-in;Silverlight Plug-In
pageName=DashboardScreen - Loading...
pageType= 
pe=lnk_o
pev2=pageDetail
s=2432x1520
server=1.1 pqalmttws301.ie.google.net:81
t=22/06/2014 10:00:00 2 420
v12=3468337910
v4=0
v9=dat=279333:279364:375870:743798:744035:743802:744033:743805:783950:783797:783949:784088
vid=29E364C5051D2894-400001468000F0EE

这样的事情:

_=1406048396605<CONTROL_CHARACTER_HERE>bh=1244<CONTROL_CHARACTER_HERE>bw=1711<CONTROL_CHARACTER_HERE>c=24<CONTROL_CHARACTER_HERE>c19=DashboardScreenc2=2014-07-22T10:00:00-0700.....etc

所以我基本上把一个多行文件放到一个单行文件中,用CONTROL_CHARACTER分隔每个字段。

这就是我目前所拥有的:

private String putIntoExpectedFormat() { 

    File f1 = new File("InputFile.txt");
    File f2 = new File("OutputFile.txt"); 

    InputStream in = new FileInputStream(f1);
    OutputStream out = new FileOutputStream(f2); 

    StringBuilder sb = new StringBuilder();

    byte[] buf = new byte[1024];
    int len;

    while( (len=in.read(buf)) > 0) {



        out.write(buf,0,len);
    }

    in.close();
    out.close();

}

我甚至不确定我是否正确行事。有人知道怎么做这个吗?

4 个答案:

答案 0 :(得分:2)

由于它是一个文本文件,因此您必须使用Reader类来读取字符流。为了获得更好的性能,请使用BufferedReader

  

从字符输入流中读取文本,缓冲字符,以便有效地读取字符,数组和行。

您可以使用Java 7 - The try-with-resources Statement

示例代码:

try (BufferedReader reader = new BufferedReader(new FileReader(
        new File("InputFile.txt")));
     BufferedWriter writer = new BufferedWriter(new FileWriter(
        new File("OutputFile.txt")))) {
    String line = null;
    while ((line = reader.readLine()) != null) {
        writer.write(line);
        // write you <CONTROL_CHARACTER_HERE> as well
    }
}

答案 1 :(得分:1)

最简单的方法是使用ScannerPrintWriter

    Scanner in = null;
    PrintWriter out = null;
    try {
        // init input, output
        in = new Scanner(new File("InputFile.txt"));
        out = new PrintWriter(new File("OutputFile.txt"));
        // read input file line by line
        while (in.hasNextLine()) {
            out.print(in.nextLine());
            if (in.hasNextLine()) {
                out.print("<CONTROL_CHARACTER>");
            }
        }
    } finally {
        // close input, output
        if (in != null) {
            in.close();
        }
        if (out != null) {
            out.close();
        }
    }

答案 2 :(得分:1)

以下是三段代码,它们将读取文件,用<CONTROL_CHARACTER>替换所有换行符,然后编写该文件。

阅读文件:

public static String readFile(String filePath) {
    String entireFile = "";

    File file = new File(filePath);

    if (file.exists()) {
        BufferedReader br;
        try {
            br = new BufferedReader(new FileReader(file));

            String line;
            while ((line = br.readLine()) != null) {
                entireFile += line + "\n";
            }

            br.close();

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    } else {
        System.err.println("File " + filePath + " does not exist!");
    }

    return entireFile;
}

将换行符更改为<Control-Character>

String text = readFile("Path/To/file.txt");
text = text.replace("\n", <Control-Character-Here>);

写下文件:

writeToFile("Path/to/newfile.txt", text);

以下是方法writeToFile()

public static void writeToFile(String filePath, String toWrite) {
    File file  = new File(filePath);
    if (!file.exists()) {
        try {
            file.createNewFile();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            System.err.println(filePath + " does not exist. Failed to create new file");
        }
    }

    try {
        PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter(filePath, true)));
        out.println(toWrite);
        out.close();
    } catch (IOException e) {
        System.err.println("Could not write to file: " + filePath);
    }
}

答案 3 :(得分:0)

  • 已使用Guava 17.0
  • 适用于小尺寸文件。没有针对大型和非常大的文件进行测试。我认为根据问题考虑的是预期的输入文件是小尺寸。
  • 这里我们不处理每一行,因此不需要逐行阅读。

使用Guava IO库的另一种方法

    public static void main(String[] args) {
        try {
            String content = Files.toString(new File("/home/chandrayya/InputFile.txt"), Charsets.UTF_8);//Change charset accordingly
            content = content.replaceAll("\r\n"/*\r\n windows format, \n UNIX/OSX format \r old mac format*/, "<C>"/*C is control character.*/);
            Files.write(content, new File("/home/chandrayya/OutputFile.txt.txt"), Charsets.UTF_8 );
        } catch( IOException e ) {
            e.printStackTrace();
        }
    }