我正在尝试使用java将逗号分隔的csv文件转换为制表符分隔的csv文件。但是文件本身内部的值很少有逗号。请参考以下示例:
Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000
Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000
那么有人可以帮助我如何处理这些价值观吗?
感谢。
答案 0 :(得分:2)
我认为你最好的选择是依靠不会改变的模式。你确实提到你有数字的问题有逗号作为千分隔符。我看到在你的行中,这些数字用双引号括起来。基于以下假设:
然后你做了以下事情:
以下代码完全符合您的要求:
import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.PrintWriter;
import java.util.ArrayList;
import java.util.List;
public class CsvToTabConvertor {
public static void main(String[] args) {
File file = new File("C:\\test_java\\csvtotab.txt");
List<String> processedLines = new ArrayList<String>();
try {
BufferedReader br = new BufferedReader(new FileReader(file));
String line;
StringBuilder builder;
while((line=br.readLine()) != null) {
builder = new StringBuilder(line);
//find number in double quote - assuming there is only one number with double quotes
int doubleQuoteIndexStart = builder.indexOf("\"");
int doubleQuoteIndexLast = builder.lastIndexOf("\"");
//for each line, find all indexes of comma
int index = builder.indexOf(",");
//previous used to when there is consecutive comma
int prevIndex = 0;
while (index >= 0) {
if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) {
builder.setCharAt(index, '\t');
}
//get next index of comma
index = builder.indexOf(",", index + 1);
//check for consecutive commas
if(index != -1 && (prevIndex +1) == index) {
builder.setCharAt(index, ' ');
//get next index of comma
index = builder.indexOf(",", index + 1);
}
}
//add the line to list of lines for later storage to file
processedLines.add(builder.toString());
}
//close the output stream
br.close();
//write all the lines to the file
File outFile = new File("C:\\test_java\\csvtotab_processed.txt");
PrintWriter writer = new PrintWriter(outFile);
for(int i = 0; i < processedLines.size(); i++) {
writer.println(processedLines.get(i));
}
writer.close();
} catch(Exception ex) {
//handle exception
}
}
}
包含以下行的输入文件:
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000
Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000
处理后的输出文件如下:
Direct - House eBay House Advertiser 537121661 160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_MyeBay_US 538146889 2015-11-18 "8,455,844" 0 0 0 0.000000 USD 0.000000 0.000000 0.000000
Direct - House eBay House Advertiser 537121661 160 x 600 eBay US Publisher 537121625 eBay.com 537224178 160x600_Search_SLR 538146895 2015-11-18 "20,175,240" 30 0 0 0.000000 USD 0.000000 0.000000 0.000000
修改上述代码及其逻辑以满足任何进一步的需求。