Java - 读取批量大小为10的大型.txt数据文件

时间:2017-04-18 08:04:51

标签: java file

我有一个大型数据文件,例如dataset.txt,其中数据格式为

1683492079 kyra maharashtra 18/04/2017 10:16:17
1644073389 pam delhi 18/04/2017 10:16:17
.......

字段是id,name,state和timestamp。

我在.txt数据文件中有大约50,000行数据。

我的要求是以10的批量大小读取此数据文件中的数据。

所以在第一批我需要读取0到9个元素。下一批从第10到第19元素等等......

使用BufferedReader我已经设法读取整个文件:

import java.io.*;
public class ReadDataFile {
    public static void main(String args[]) throws IOException {
        BufferedReader br = new BufferedReader(new FileReader("dataset.txt"));
        String line;
        while((line = br.readLine())!= null)
        {
           System.out.println(line);
        }
        br.close();
    }
}

但是我的要求是以10的批量大小读取文件。我是Java的新手,所以如果有人可以用简单的术语帮助我,我将非常感激。

根据@GhostCat的回答 - 这就是我所拥有的 -

public class ReadDataFile {
public static void main(String args[]) throws IOException {
    BufferedReader br = new BufferedReader(new FileReader("dataSetExample.txt"));
    readBatch(br,10);       
}

public static void readBatch(BufferedReader reader, int batchSize) throws IOException {
       List<String> result = new ArrayList<>();
       for (int i = 0; i < batchSize; i++) {
         String line = reader.readLine();
         if (line != null) {
         // result.add(line);
             System.out.println(line);
        }
      }
     // return result;
       return ;
    }
}

在readBatch方法中读取文件,那么如何在main方法中知道到达文件末尾以调用下10条记录?请帮助。

2 个答案:

答案 0 :(得分:4)

您的要求并不是很清楚;但是很容易让你开始:

A)你的主要方法不应该做任何阅读;它只是准备BufferedReader对象

B)您使用该方法使用该方法:

private static List<String> readBatch(Reader reader, int batchSize) throws IOException {
   List<String> result = new ArrayList<>();
   for (int i = 0; i < batchSize; i++) {
     String line = reader.readLine();
     if (line != null) {
      result.add(line);
     } else {
      return result;
     }
  }
  return result;
}

要在你的主要使用:

BufferedReader reader = ...
int batchSize = 10;
boolean moreLines = true;
while (moreLines) {
  List<String> batch = readBatch(reader, batchSize);
  ... do something with that list
  if (batch.size() < batchSize) {
    moreLines = false;
}

这意味着&#34;建议&#34;你怎么能接近这个。我的回答中缺少的东西:可能你应该使用一个独特的类,并在那里进行解析(并返回List<DataClass>而不是移动那些原始的&#34;行字符串&#34;。

当然:50000行并不是真正的数据。除非我们正在谈论嵌入式设备,否则关于&#34;批处理风格&#34;确实没什么意义。

最后:术语批次处理具有非常明显的含义;同样在Java中,如果您打算去那里,请参阅here以进一步阅读。

答案 1 :(得分:2)

任何需要工作的人---

// Create a method to read lines (using buffreader) and should accept the batchsize as argument 
private static List<String> readBatch(BufferedReader br, int batchSize) throws IOException {
    // Create a List object which will contain your Batch Sized lines
    List<String> result = new ArrayList<>();
    for (int i = 1; i < batchSize; i++) {  // loop thru all your lines
        String line = br.readLine();
        if (line != null) {
            result.add(line);   // add your lines to your (List) result
        } else {
            return result;  // Return your (List) result
        }
    }
    return result;   // Return your (List) result
}

public static void main(String[] args) throws IOException {
    //input file
    BufferedReader br = new BufferedReader(new FileReader("c://ldap//buffreadstream2.csv"));
    //output file
    BufferedWriter bw = new BufferedWriter(new FileWriter("c://ldap//buffreadstream3.csv"));
    // Your Batch size i.e. how many lines you want in your batch 
    int batchSize = 5;  // Define your batchsize here
    String line = null;
    long batchNumber = 1;
    try {
        List<String> mylist = null;
        while ((line = br.readLine()) != null) {                  // Do it for your all line in your csv file
            bw.write("Batch Number # " + batchNumber + "\n");
            System.out.println("Batch Number # " + batchNumber);
            bw.write(line + "\n");   // Since br.readLine() reads the next line you have to catch your first line here itself
            System.out.println(line); // else you will miss every batchsize number line
            // process your First Line here...
            mylist = readBatch(br, batchSize); // get/catch your (List) result here as returned from readBatch() method

            for (int i = 0; i < mylist.size(); i++) {   
                System.out.println(mylist.get(i));
                // process your lines here...
                bw.write(mylist.get(i) + "\n");     // write/process your returned lines
            }
            batchNumber++;
        }

        System.out.println("Lines are Successfully copied!");

        br.close();  // one you are done .. dont forget to close/flush
        br = null;   // all 
        bw.flush();  // your
        bw.close();  // BR and 
        bw = null;   // BWs..
    } catch (Exception e) {
        System.out.println("Exception caught: " + e.getMessage());   // Catch any exception here
    }
}