我有一个大型数据文件,例如dataset.txt,其中数据格式为
1683492079 kyra maharashtra 18/04/2017 10:16:17
1644073389 pam delhi 18/04/2017 10:16:17
.......
字段是id,name,state和timestamp。
我在.txt数据文件中有大约50,000行数据。
我的要求是以10的批量大小读取此数据文件中的数据。
所以在第一批我需要读取0到9个元素。下一批从第10到第19元素等等......
使用BufferedReader我已经设法读取整个文件:
import java.io.*;
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataset.txt"));
String line;
while((line = br.readLine())!= null)
{
System.out.println(line);
}
br.close();
}
}
但是我的要求是以10的批量大小读取文件。我是Java的新手,所以如果有人可以用简单的术语帮助我,我将非常感激。
根据@GhostCat的回答 - 这就是我所拥有的 -
public class ReadDataFile {
public static void main(String args[]) throws IOException {
BufferedReader br = new BufferedReader(new FileReader("dataSetExample.txt"));
readBatch(br,10);
}
public static void readBatch(BufferedReader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
// result.add(line);
System.out.println(line);
}
}
// return result;
return ;
}
}
在readBatch方法中读取文件,那么如何在main方法中知道到达文件末尾以调用下10条记录?请帮助。
答案 0 :(得分:4)
您的要求并不是很清楚;但是很容易让你开始:
A)你的主要方法不应该做任何阅读;它只是准备BufferedReader对象
B)您使用该方法使用该方法:
private static List<String> readBatch(Reader reader, int batchSize) throws IOException {
List<String> result = new ArrayList<>();
for (int i = 0; i < batchSize; i++) {
String line = reader.readLine();
if (line != null) {
result.add(line);
} else {
return result;
}
}
return result;
}
要在你的主要使用:
BufferedReader reader = ...
int batchSize = 10;
boolean moreLines = true;
while (moreLines) {
List<String> batch = readBatch(reader, batchSize);
... do something with that list
if (batch.size() < batchSize) {
moreLines = false;
}
这意味着&#34;建议&#34;你怎么能接近这个。我的回答中缺少的东西:可能你应该使用一个独特的类,并在那里进行解析(并返回List<DataClass>
而不是移动那些原始的&#34;行字符串&#34;。
当然:50000行并不是真正的数据。除非我们正在谈论嵌入式设备,否则关于&#34;批处理风格&#34;确实没什么意义。
最后:术语批次处理具有非常明显的含义;同样在Java中,如果您打算去那里,请参阅here以进一步阅读。
答案 1 :(得分:2)
任何需要工作的人---
// Create a method to read lines (using buffreader) and should accept the batchsize as argument
private static List<String> readBatch(BufferedReader br, int batchSize) throws IOException {
// Create a List object which will contain your Batch Sized lines
List<String> result = new ArrayList<>();
for (int i = 1; i < batchSize; i++) { // loop thru all your lines
String line = br.readLine();
if (line != null) {
result.add(line); // add your lines to your (List) result
} else {
return result; // Return your (List) result
}
}
return result; // Return your (List) result
}
public static void main(String[] args) throws IOException {
//input file
BufferedReader br = new BufferedReader(new FileReader("c://ldap//buffreadstream2.csv"));
//output file
BufferedWriter bw = new BufferedWriter(new FileWriter("c://ldap//buffreadstream3.csv"));
// Your Batch size i.e. how many lines you want in your batch
int batchSize = 5; // Define your batchsize here
String line = null;
long batchNumber = 1;
try {
List<String> mylist = null;
while ((line = br.readLine()) != null) { // Do it for your all line in your csv file
bw.write("Batch Number # " + batchNumber + "\n");
System.out.println("Batch Number # " + batchNumber);
bw.write(line + "\n"); // Since br.readLine() reads the next line you have to catch your first line here itself
System.out.println(line); // else you will miss every batchsize number line
// process your First Line here...
mylist = readBatch(br, batchSize); // get/catch your (List) result here as returned from readBatch() method
for (int i = 0; i < mylist.size(); i++) {
System.out.println(mylist.get(i));
// process your lines here...
bw.write(mylist.get(i) + "\n"); // write/process your returned lines
}
batchNumber++;
}
System.out.println("Lines are Successfully copied!");
br.close(); // one you are done .. dont forget to close/flush
br = null; // all
bw.flush(); // your
bw.close(); // BR and
bw = null; // BWs..
} catch (Exception e) {
System.out.println("Exception caught: " + e.getMessage()); // Catch any exception here
}
}