在Java

时间:2016-01-28 17:31:07

标签: java csv zip

我正在编写一个我需要压缩的csv文件,并且正在使用java.util.zip.ZipEntry和java.util.zip.ZipOutputStream。

当我在所有列中都有西方字符时,这一切都很有效,但是当我使用韩文字符时,它无法识别/ n,并且所有内容都显示在同一行上。我把它写成UTF-8字符,并期望这涵盖韩国。

import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;

import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

public class CreateCSV {

public static void main(String[] args) throws IOException {

    DateTime utcDateTime = new DateTime().toDateTime(DateTimeZone.UTC);
    DateTime newDateTime = utcDateTime.toDateTime();
    DateTimeFormatter dateFormatter = DateTimeFormat.forPattern("yyyyMMdd-HHmmss-SSS-");
    File zipFile = new File("C:/TestCSVKorean/"+ dateFormatter.print(newDateTime) + "Export.zip");

    FileOutputStream fileOutputStream = new FileOutputStream(zipFile);
    // Open up the zipfile and create the csv entry
    ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(fileOutputStream));
    zos.putNextEntry(new ZipEntry(dateFormatter.print(newDateTime) + "tics.csv"));
    // The first line of the CSV is a header line
    StringBuffer csvHeader = new StringBuffer(
            "Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
                + "Og Name,Op Name,"
              + "TTR,Lat,Bys Rec,Bytt,"
                    + "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
                    + "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n");
    zos.write(csvHeader.toString().getBytes(), 0, csvHeader.length());

    StringBuffer csvData = new StringBuffer("");

    csvData.append("\"" + newDateTime + "\",\"" +
            "apiName" + "\"," +
            "2.0.0" + ",\"" + 
            "app name" + "\",\"" +
            "env name" + "\",\"" +
            "plan name" + "\"," +
            "2" + ",\"" +
            "dev org name" + "\",\"" +
            "ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
            "123" + "," +
            "inifd: 334;dshs: 343" + ", " +
            "10" + "," + 
            "33" + ",\"" + 
            "http" + "\",\"" + 
            "GET" + "\",\"" + 
            "/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" + 
            "200" + "\",\"" + 
            "jshkshdf" + "\",\"" + 
            "sdjhfks/sdfs/" + "\",\"" +                             
            "jhksdfhks dsfs" + "\",\"" +    
            "dsfsdfs" + "\",\"" +       
            "dsfsfs" + "\"\n"); 

    zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());

    csvData = new StringBuffer("");

    csvData.append("\"" + newDateTime + "\",\"" +
            "apiName" + "\"," +
            "2.0.0" + ",\"" + 
            "app name" + "\",\"" +
            "env name" + "\",\"" +
            "plan name" + "\"," +
            "2" + ",\"" +
            "dev org name" + "\",\"" +
            "ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
            "123" + "," +
            "inifd: 334;dshs: 343" + ", " +
            "10" + "," + 
            "33" + ",\"" + 
            "http" + "\",\"" + 
            "GET" + "\",\"" + 
            "/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" + 
            "200" + "\",\"" + 
            "jshkshdf" + "\",\"" + 
            "sdjhfks/sdfs/" + "\",\"" +                             
            "jhksdfhks dsfs" + "\",\"" +    
            "dsfsdfs" + "\",\"" +       
            "dsfsfs" + "\"\n"); 


    zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());

    zos.close();

}

}

这是我打开csv文件时看到的内容:

时间名称修订Appme EnvName PlanName PlanRev Og名称操作名称TTR Lat Bys Rec Bytt RPl请求方法URI路径查询字符串HTTP状态代码HTTP请求标头用户代理请求正文HTTP响应标题响应正文
2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev orgnameúËËÃð ÃÅ¡''(ÓÂÃÂÂÂËËÃÂë« °°•••Ã 123 inifd:334; dshs:343 10 33 http 2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev org name ÃÂÂËËÃÂàê¼ “ÃÂÂËËÃá¼ žÂú˜ëÂ|°«€123 123 inifd:334; dshs:343 10 33 http

它将第二行的日期字段粘贴到第一行的Request方法字段中:2016-01-28T17:20:56.859Z

1 个答案:

答案 0 :(得分:2)

首先,你应该养成使用StringBuffer的习惯。这是一个过时的课程。如果您需要一点一点地附加文本,通常会使用StringBuilder代替。

但是,在您的情况下,您不需要StringBuilder或StringBuffer。只需使用字符串:

String csvHeader =
        "Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
            + "Og Name,Op Name,"
          + "TTR,Lat,Bys Rec,Bytt,"
                + "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
                + "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n";

和...

String csvData = "\"" + newDateTime + "\",\"" +
        "apiName" + "\"," +
        "2.0.0" + ",\"" + 
        "app name" + "\",\"" +
        // etc.

其次,注意不要将字节数与字符数混淆。使用UTF-8字符集将字符串转换为字节时,任何不在US-ASCII范围(0-127)内的字符都将转换为多个字节。因此,字节数将大于字符串的长度(表示它包含的字符数,而不是以UTF-8编码时占用的字节数。)

所以你的写操作应该是:

zos.write(csvData.toString().getBytes("UTF-8"));

第三,我不懂韩语,但我知道Hangul字符的样子,我在你的代码中看不到任何字符。我假设你打算将这些作为韩文:

"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +

您似乎正在使用Windows将每个单独的UTF-8字节放在String中,就像它是一个字符一样。但在Java中,字节不是字符,并且不能与字符互换。

我假设你使用Windows,因为第三个字符,间距Unicode字符SMALL TILDE\u02dc,通常占用两个字节,但在windows-1252编码中,它是单字节0x98。

因此,如果我假设您从韩文字符的UTF-8字节派生这些字符,则上述字符串中的前六个字节为:

ec ba 98 eb a6 b0
ì  º  ˜  ë  ¦  °

这些字节是两个韩文字符U + CE98和U + B9B0的UTF-8表示。将这两个字符放在Java字符串中的正确方法是:

"\uce98\ub9b0"

如果文件中包含原始Hangul文本,则可以使用每个JDK附带的native2ascii工具轻松地将整个文本转换为一系列Java转义序列(如上一行)。这样的命令可能如下所示:

native2ascii -encoding UTF-8 hangul.txt hangulstrings.java

我不建议使用另一种方法,如果你不想打扰正确编写你的字符串,那就是通过识别当前的“伪字节”字符串来强制解释为UTF-8字节它包含表示字节的Windows-1252字符并将其恢复为这些字节:

zos.write(csvData.getBytes("windows-1252"));

生成的zip条目仍将以UTF-8编码,因为您的字节是韩文文本的UTF-8表示形式。因此,您需要确保使用识别文件为UTF-8的工具打开文件。

Windows在识别UTF-8文件方面并不是特别擅长。记事本特别差。向Windows发信号通知文件是UTF-8文件的一种方法是将字节顺序标记字符写为文件中的第一个字符:

String csvHeader = "\ufeff"
        + "Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
        // etc.