Java:检测csv或txt文件的定界符

时间:2019-09-28 06:04:12

标签: java csv delimiter file-handling

我看到这个问题已经被问过好几次了,但他们使用的是其他语言,我无法理解答案。

我正在通过套接字接收.csv或.txt文件。 有什么方法可以检测CSV或TXT文件中一行的定界符或“分隔符”?

这是处理文件写入的服务器代码,

try{
final ServerSocket server = new ServerSocket(8998);
socket = server.accept();
File sdcard = Environment.getExternalStorageDirectory();
File myFile = new File(sdcard,"TestReceived"+curDate+".csv");

final BufferedReader br = new BufferedReader(new InputStreamReader(socket.getInputStream()));
final PrintWriter pw = new PrintWriter(new FileWriter(myFile));

 String line;
 String[] wordsarray;

 int bc = 0;
 int dc = 0;
 int pq = 0;
 int rq = 0;
 int id = 0;

 line = br.readLine();
 wordsarray = line.split(",");

 for (int x = 0; x<wordsarray.length; x++){

 switch(wordsarray[x]){
      case "COLUMN NAME A": id = x;
      break;
      case "COLUMN NAME B": bc = x;
      break;
      case "COLUMN NAME C": dc = x;
      break;
      case "COLUMN NAME D": pq = x;
      break;
      case "COLUMN NAME E": rq = x; 
      break;
      }              
  }
         pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);


         for (line = br.readLine(); line != null; line = br.readLine()) {
                wordsarray = line.split(",");
                pw.println(wordsarray[dc]+"\t"+wordsarray[rq]+"\t"+wordsarray[pq]+"\t"+wordsarray[bc]+"\t"+wordsarray[id]);


                        }
              pw.flush();
              pw.close();
              br.close();
              socket.close();
              server.close();

}
catch (Exception e){
e.printStackTrace();
 }

如果我在line.split();上加上逗号,并且文件具有不同的定界符,则会产生重复的行,我什至不知道为什么会发生这种情况

COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E

但是,如果文件具有匹配的逗号分隔符,则会产生正确的输出。

 COLUMN NAME A COLUMN NAME B COLUMN NAME C COLUMN NAME D COLUMN NAME E

有什么方法可以自动检测文件的分隔符,因此我不必担心文件使用的是哪个分隔符?还是有更好的解决方案?

1 个答案:

答案 0 :(得分:0)

使用BufferedReader,放置mark(...),然后阅读第一个line。如果该行包含一个\t制表符,则您的文件用制表符分隔,否则假定它用逗号分隔。

然后使用CSV / TSV解析器解析文件,例如Apache Commons CSV

try (BufferedReader in = Files.newBufferedReader​(Paths.get(filename))) {
    in.mark(1024);
    String line = in.readLine();
    if (line == null)
        throw new IOException("File is empty: " + filename);
    CSVFormat fileFormat = (line.indexOf('\t') != -1 ? CSVFormat.TDF
                                                     : CSVFormat.RFC4180)
            .withHeader();
    in.reset();

    for (CSVRecord record : fileFormat.parse(in)) {
        String lastName = record.get("Last Name");
        String firstName = record.get("First Name");
        ...
    }
}