我需要读取一个大的csv文件(328 MB)并对其进行处理。每行的处理还包括调用Web服务。
我第一次使用ThreadPoolExecutor。我的逻辑是,我将从csv吐出每100行并创建一个线程,该线程将运行并处理每一行并将结果写入templ文件。完成所有线程后,我将读取临时文件并创建一个合并的输出文件。
我的方法是拆分文件并创建线程
private List<Thread> invokeWS(String csvFilename, String tempFolder) {
List<Thread> processCsvThreadList = new ArrayList<Thread>();
//Thread Pool Executer
int corePoolSize = 3;
int maximumPoolSize = 6;
long keepAliveTime = 10;
ThreadFactory threadFactory = Executors.defaultThreadFactory();
ThreadPoolExecutor thrdPoolEx = new ThreadPoolExecutor(corePoolSize,
maximumPoolSize, keepAliveTime, TimeUnit.SECONDS,
new ArrayBlockingQueue<Runnable>(2));
try {
BufferedReader bfr = new BufferedReader(new FileReader(csvFilename));
String line = "";
int i = 0;
line = bfr.readLine();
Thread csvThread;
List<String> rowList = new ArrayList<String>();
do {
line = bfr.readLine();
if (line != null) {
rowList.add(line);
i++;
if (i % 100 == 0) {
csvThread = new Thread(new ProcessCsvRow(rowList,
tempFolder));
csvThread.start();
thrdPoolEx.execute(csvThread);
rowList = new ArrayList<String>();
processCsvThreadList.add(csvThread);
}
} else {
if (null != rowList && !rowList.isEmpty()) {
csvThread = new Thread(new ProcessCsvRow(rowList,
tempFolder));
csvThread.start();
thrdPoolEx.execute(csvThread);
processCsvThreadList.add(csvThread);
}
break;
}
} while (true);
} catch (FileNotFoundException fnf) {
fnf.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
finally{
thrdPoolEx.shutdown();
}
return processCsvThreadList;
}
我的ProcessCsvRow类
public class ProcessCsvRow implements Runnable {
private List<String> csvRowsList;
private String tempDir;
public ProcessCsvRow(List<String> csvRowsList, String tempDir) {
this.csvRowsList = csvRowsList;
this.tempDir = tempDir;
}
@Override
public void run() {
UUID idOne = UUID.randomUUID();
FileWriter fw = null;
BufferedWriter bufferedWriter = null;
try {
String res = "";
fw = new FileWriter(new File(tempDir + "\\" + idOne.toString()+FilePropConstants.FILE_NAME_EXT_TMP));
bufferedWriter = new BufferedWriter(fw);
SentimentAnalyzer sentimentAnalyzer = new SentimentAnalyzer();
for (String csvRow : csvRowsList) {
//calling webservice for each row
res = sentimentAnalyzer.invokeSentWS(csvRow);
bufferedWriter.write(res);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (bufferedWriter != null) {
bufferedWriter.flush();
bufferedWriter.close();
}
if (fw != null) {
fw.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
}
}
问题是如果对于5行csv应该创建一个临时文件,但是当我运行这个程序时,我得到两个生成的临时文件,这是错误的。我强烈认为这不是一个逻辑问题,而是我实现ThreadPoolExecuter的方式。
非常感谢任何帮助。
答案 0 :(得分:1)
您不应该创建Thread,也不需要直接创建线程池。
尝试
ExecutorService es = Executors.newFixedThreadPool(8);
es.submit(runnable); // not threads
BTW每个线程必须创建自己的输出文件,或者您需要锁定共享文件,或者您可以提交Callable并让它返回您想要登录到提交线程的内容。
答案 1 :(得分:1)
这是因为你自己都在启动线程,并要求执行者执行它。
变化:
csvThread = new Thread(new ProcessCsvRow(rowList, tempFolder));
csvThread.start();
thrdPoolEx.execute(csvThread);
rowList = new ArrayList<String>();
processCsvThreadList.add(csvThread);
为:
csvThread = new Thread(new ProcessCsvRow(rowList, tempFolder));
thrdPoolEx.execute(csvThread);
rowList = new ArrayList<String>();
processCsvThreadList.add(csvThread);