将逗号分隔的csv文件转换为使用java扩展的选项卡

时间:2015-11-26 01:17:00

标签: java csv tabs comma separator

我正在尝试使用java将逗号分隔的csv文件转换为制表符分隔的csv文件。但是文件本身内部的值很少有逗号。请参考以下示例:

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000

Direct - House,Bayer House Advertiser,537121661,,160 x 600,Bayer US Publisher,537121625,Bayer.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000

那么有人可以帮助我如何处理这些价值观吗?

感谢。

1 个答案:

答案 0 :(得分:2)

我认为你最好的选择是依靠不会改变的模式。你确实提到你有数字的问题有逗号作为千分隔符。我看到在你的行中,这些数字用双引号括起来。基于以下假设:

  1. 该号码用双引号括起来
  2. 每行中只有一个这样的数字(如果多于一个,则找到所有双引号并将它们存储在数组或列表中,并检查以确保索引不在每个范围内)
  3. 然后你做了以下事情:

    1. 获取双引号的第一个索引,即154
    2. 获取双引号的第二个/最后一个索引,即159
    3. 将所有逗号替换为\ t,前提是逗号索引小于第一个双引号的第一个索引或逗号索引大于双引号的最后一个索引(这应该跳过要替换为\ t的数字的逗号)
    4. 以下代码完全符合您的要求:

      import java.io.BufferedReader;
      import java.io.File;
      import java.io.FileReader;
      import java.io.PrintWriter;
      import java.util.ArrayList;
      import java.util.List;
      
      public class CsvToTabConvertor {
          public static void main(String[] args) {
              File file = new File("C:\\test_java\\csvtotab.txt"); 
              List<String> processedLines = new ArrayList<String>();
      
              try {
                  BufferedReader br = new BufferedReader(new FileReader(file)); 
                  String line; 
                  StringBuilder builder; 
                  while((line=br.readLine()) != null) {
                      builder = new StringBuilder(line); 
      
                      //find number in double quote - assuming there is only one number with double quotes
                      int doubleQuoteIndexStart = builder.indexOf("\""); 
                      int doubleQuoteIndexLast = builder.lastIndexOf("\""); 
      
                      //for each line, find all indexes of comma
                      int index = builder.indexOf(",");
      
                      //previous used to when there is consecutive comma
                      int prevIndex = 0; 
      
                      while (index >= 0) {
                          if(index < doubleQuoteIndexStart || index > doubleQuoteIndexLast) {
                              builder.setCharAt(index, '\t'); 
                          }
      
                          //get next index of comma
                          index = builder.indexOf(",", index + 1);
      
                          //check for consecutive commas
                          if(index != -1 && (prevIndex +1) == index) {
                              builder.setCharAt(index, ' ');
                              //get next index of comma
                              index = builder.indexOf(",", index + 1);
                          }
                      }
      
                      //add the line to list of lines for later storage to file
                      processedLines.add(builder.toString());
                  }
      
                  //close the output stream
                  br.close(); 
      
                  //write all the lines to the file
                  File outFile = new File("C:\\test_java\\csvtotab_processed.txt");
                  PrintWriter writer = new PrintWriter(outFile); 
                  for(int i = 0; i < processedLines.size(); i++) {
                      writer.println(processedLines.get(i));
                  }
      
                  writer.close(); 
              } catch(Exception ex) {
                  //handle exception
              }
          }
      }
      

      包含以下行的输入文件:

      Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_MyeBay_US,538146889,2015-11-18,"8,455,844",0,0,0,0.000000,USD,0.000000,0.000000,0.000000
      Direct - House,eBay House Advertiser,537121661,,160 x 600,eBay US Publisher,537121625,eBay.com,537224178,160x600_Search_SLR,538146895,2015-11-18,"20,175,240",30,0,0,0.000000,USD,0.000000,0.000000,0.000000
      

      处理后的输出文件如下:

      Direct - House  eBay House Advertiser   537121661       160 x 600   eBay US Publisher   537121625   eBay.com    537224178   160x600_MyeBay_US   538146889   2015-11-18  "8,455,844" 0   0   0   0.000000    USD 0.000000    0.000000    0.000000
      Direct - House  eBay House Advertiser   537121661       160 x 600   eBay US Publisher   537121625   eBay.com    537224178   160x600_Search_SLR  538146895   2015-11-18  "20,175,240"    30  0   0   0.000000    USD 0.000000    0.000000    0.000000
      

      修改上述代码及其逻辑以满足任何进一步的需求。