基本上我必须使用多线程处理一个包含近100万条记录的大型csv文件。
我创建了一个IngestionCallerThread类
public class IngestionCallerThread {
public static void main(String[] args) {
try {
int count = 0;
InputStream ios = IngestionCallerThread.class.getClassLoader().getResourceAsStream("aa10.csv");
byte[] buff = new byte[8000];
int bytesRead = 0;
ByteArrayOutputStream bao = new ByteArrayOutputStream();
while ((bytesRead = ios.read(buff)) != -1) {
bao.write(buff, 0, bytesRead);
}
byte[] data = bao.toByteArray();
ByteArrayInputStream bin = new ByteArrayInputStream(data);
BufferedReader fileInputStreamBufferedReader = new BufferedReader(new InputStreamReader(bin));
while ((fileInputStreamBufferedReader.readLine()) != null) {
count++;
}
bin.reset();
int numberOfThreads = 12;
int rowsForEachThread = count / numberOfThreads;
int remRows = count % numberOfThreads;
int startPosition = 0;
System.out.println(count);
ExecutorService es = Executors.newCachedThreadPool();
for (int i = 0; i < numberOfThreads && startPosition < count; i++) {
if (remRows > 0 && i + 1 >= numberOfThreads)
rowsForEachThread = remRows;
IngestionThread ingThread = new IngestionThread(bin, startPosition, rowsForEachThread);
es.execute(ingThread);
startPosition = (startPosition + rowsForEachThread);
}
es.shutdown();
if (es.isTerminated()) {
System.out.println("Completed");
}
// t2.start();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
我用来调用我已经实现的另一个Runnable类
public class IngestionThread implements Runnable {
InputStream is;
long startPosition;
long length;
public IngestionThread(InputStream targetStream, long position, long length) {
this.is = targetStream;
this.startPosition = position;
this.length = length;
}
@Override
public void run() {
// TODO Auto-generated method stub
int currentPosition = 0;
try {
is.reset();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
BufferedReader fileInputStreamBufferedReader = new BufferedReader(new InputStreamReader(is));
if (startPosition != 0) {
String line;
try {
while (((line = fileInputStreamBufferedReader.readLine())) != null) {
if (currentPosition + 1 == startPosition)
break;
currentPosition++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
try {
int execLength = 0;
String line;
while ((line = fileInputStreamBufferedReader.readLine()) != null && execLength < length) {
System.out.println(line);
execLength++;
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
我测试了20个记录的小型csv文件。问题是当我调试类几乎所有记录都被打印出来时。但是当我运行课程时,有时会读取15条记录,有时会读取12条记录。我不确定是什么问题。任何帮助将非常感激。提前谢谢。
答案 0 :(得分:2)
您遇到问题的原因是您有许多线程从包含共享BufferedReader
的不同ByteArrayInputStream
对象中读取。没有同步,这意味着不同的线程将读取其他线程应该读取的流的部分。
每个线程都需要自己的ByteArrayInputStream
。