我正在编写一个我需要压缩的csv文件,并且正在使用java.util.zip.ZipEntry和java.util.zip.ZipOutputStream。
当我在所有列中都有西方字符时,这一切都很有效,但是当我使用韩文字符时,它无法识别/ n,并且所有内容都显示在同一行上。我把它写成UTF-8字符,并期望这涵盖韩国。
import java.io.BufferedOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipOutputStream;
import org.joda.time.DateTime;
import org.joda.time.DateTimeZone;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;
public class CreateCSV {
public static void main(String[] args) throws IOException {
DateTime utcDateTime = new DateTime().toDateTime(DateTimeZone.UTC);
DateTime newDateTime = utcDateTime.toDateTime();
DateTimeFormatter dateFormatter = DateTimeFormat.forPattern("yyyyMMdd-HHmmss-SSS-");
File zipFile = new File("C:/TestCSVKorean/"+ dateFormatter.print(newDateTime) + "Export.zip");
FileOutputStream fileOutputStream = new FileOutputStream(zipFile);
// Open up the zipfile and create the csv entry
ZipOutputStream zos = new ZipOutputStream(new BufferedOutputStream(fileOutputStream));
zos.putNextEntry(new ZipEntry(dateFormatter.print(newDateTime) + "tics.csv"));
// The first line of the CSV is a header line
StringBuffer csvHeader = new StringBuffer(
"Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
+ "Og Name,Op Name,"
+ "TTR,Lat,Bys Rec,Bytt,"
+ "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
+ "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n");
zos.write(csvHeader.toString().getBytes(), 0, csvHeader.length());
StringBuffer csvData = new StringBuffer("");
csvData.append("\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
"env name" + "\",\"" +
"plan name" + "\"," +
"2" + ",\"" +
"dev org name" + "\",\"" +
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
"123" + "," +
"inifd: 334;dshs: 343" + ", " +
"10" + "," +
"33" + ",\"" +
"http" + "\",\"" +
"GET" + "\",\"" +
"/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" +
"200" + "\",\"" +
"jshkshdf" + "\",\"" +
"sdjhfks/sdfs/" + "\",\"" +
"jhksdfhks dsfs" + "\",\"" +
"dsfsdfs" + "\",\"" +
"dsfsfs" + "\"\n");
zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());
csvData = new StringBuffer("");
csvData.append("\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
"env name" + "\",\"" +
"plan name" + "\"," +
"2" + ",\"" +
"dev org name" + "\",\"" +
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
"123" + "," +
"inifd: 334;dshs: 343" + ", " +
"10" + "," +
"33" + ",\"" +
"http" + "\",\"" +
"GET" + "\",\"" +
"/dsfs/sdf/ds" + "\",\"" + "query string" + "\",\"" +
"200" + "\",\"" +
"jshkshdf" + "\",\"" +
"sdjhfks/sdfs/" + "\",\"" +
"jhksdfhks dsfs" + "\",\"" +
"dsfsdfs" + "\",\"" +
"dsfsfs" + "\"\n");
zos.write(csvData.toString().getBytes("UTF-8"), 0, csvData.length());
zos.close();
}
}
这是我打开csv文件时看到的内容:
时间名称修订Appme EnvName PlanName PlanRev Og名称操作名称TTR Lat Bys Rec Bytt RPl请求方法URI路径查询字符串HTTP状态代码HTTP请求标头用户代理请求正文HTTP响应标题响应正文
2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev orgnameúËËÃð ÃÅ¡''(ÓÂÃÂÂÂËËÃÂë« °°•••Ã 123 inifd:334; dshs:343 10 33 http 2016-01-28T17:20:56.859Z apiName 2.0.0 app name env name plan name 2 dev org name ÃÂÂËËÃÂàê¼ “ÃÂÂËËÃá¼ žÂú˜ëÂ|°«€123 123 inifd:334; dshs:343 10 33 http
它将第二行的日期字段粘贴到第一行的Request方法字段中:2016-01-28T17:20:56.859Z
答案 0 :(得分:2)
首先,你应该养成使用StringBuffer的习惯。这是一个过时的课程。如果您需要一点一点地附加文本,通常会使用StringBuilder代替。
但是,在您的情况下,您不需要StringBuilder或StringBuffer。只需使用字符串:
String csvHeader =
"Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
+ "Og Name,Op Name,"
+ "TTR,Lat,Bys Rec,Bytt,"
+ "RPl,Request Method,URI Path,Query String,HTTP Status Code,"
+ "HTTP Request Headers,User Agent,Request Body,HTTP Response Headers,Response Body\n";
和...
String csvData = "\"" + newDateTime + "\",\"" +
"apiName" + "\"," +
"2.0.0" + ",\"" +
"app name" + "\",\"" +
// etc.
其次,注意不要将字节数与字符数混淆。使用UTF-8字符集将字符串转换为字节时,任何不在US-ASCII范围(0-127)内的字符都将转换为多个字节。因此,字节数将大于字符串的长度(表示它包含的字符数,而不是以UTF-8编码时占用的字节数。)
所以你的写操作应该是:
zos.write(csvData.toString().getBytes("UTF-8"));
第三,我不懂韩语,但我知道Hangul字符的样子,我在你的代码中看不到任何字符。我假设你打算将这些作为韩文:
"ìº˜ë¦°ë” ëª©ë¡ì¡°íšŒ(ë‚´ 캘린ë”, 구ë…가능한 캘린ë”, 시스템 캘린ë”, ê´€ë¦¬ìž ìº˜ë¦°ë”)" + "\"," +
您似乎正在使用Windows将每个单独的UTF-8字节放在String中,就像它是一个字符一样。但在Java中,字节不是字符,并且不能与字符互换。
我假设你使用Windows,因为第三个字符,间距Unicode字符SMALL TILDE是\u02dc
,通常占用两个字节,但在windows-1252编码中,它是单字节0x98。
因此,如果我假设您从韩文字符的UTF-8字节派生这些字符,则上述字符串中的前六个字节为:
ec ba 98 eb a6 b0
ì º ˜ ë ¦ °
这些字节是两个韩文字符U + CE98和U + B9B0的UTF-8表示。将这两个字符放在Java字符串中的正确方法是:
"\uce98\ub9b0"
如果文件中包含原始Hangul文本,则可以使用每个JDK附带的native2ascii工具轻松地将整个文本转换为一系列Java转义序列(如上一行)。这样的命令可能如下所示:
native2ascii -encoding UTF-8 hangul.txt hangulstrings.java
我不建议使用另一种方法,如果你不想打扰正确编写你的字符串,那就是通过识别当前的“伪字节”字符串来强制解释为UTF-8字节它包含表示字节的Windows-1252字符并将其恢复为这些字节:
zos.write(csvData.getBytes("windows-1252"));
生成的zip条目仍将以UTF-8编码,因为您的字节是韩文文本的UTF-8表示形式。因此,您需要确保使用识别文件为UTF-8的工具打开文件。
Windows在识别UTF-8文件方面并不是特别擅长。记事本特别差。向Windows发信号通知文件是UTF-8文件的一种方法是将字节顺序标记字符写为文件中的第一个字符:
String csvHeader = "\ufeff"
+ "Time,Name,Rev,Appme,EnvName,PlanName,PlanRev,"
// etc.