我在亚马逊S3中有一个文件,它有大约200万条记录。现在我想使用线程处理这些记录,以便快速完成处理。我知道这可以使用spark或mapreduce来完成。但我不能使用火花或MR,因为它是一个约束。
目前我做了以下
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
S3Object s3object = s3Client
.getObject(new GetObjectRequest(s3Conn.getBucket(), objectSummary.getKey()));
BufferedReader reader = new BufferedReader(new InputStreamReader(s3object.getObjectContent()));
List<Events> ingEvents = new LinkedList<>();
while ((fileLine = reader.readLine()) != null) {
//Processing the line
}
}
任何有关如何在java中执行此操作的建议都将会有很大帮助。提前致谢。干杯!
答案 0 :(得分:0)
我会使用&#34; split&#34; Linux中的命令
例如,将一个大文件拆分成每个包含10000行的较小文件:
<system.web>
<httpRuntime executionTimeout="5" targetFramework="4.5"/>
<compilation targetFramework="4.5" />
</system.web>
然后,Java程序可以处理每个单独的文件
答案 1 :(得分:0)
您可以使用java.util.Scanner
逐行或正则表达式来读取文件。简短演示,演示如何操作:
String xmlFile = null;
Scanner sc = new Scanner(new File(xmlFile));
String nextLine;
while ((nextLine = sc.nextLine()) != null) {
}
首先创建Scanner
对象,将其作为参数File xmlFile
。接下来,您逐行读取文件并在while循环中处理行。读取所有行时sc.nextLine()
返回null。
答案 2 :(得分:0)
多线程处理文件的简单方法是使用Java 8 lambdas,例如:
public class ThreadTest {
static final int THREAD_POOL_SIZE = 3;
static final String []myData = {
"Line 1","Line 2","Line 3","Line 4","Line 5","Line 6","Line 7","Line 8","Line 9","Line 10","Line 11","Line 12"
};
static final List<String> myList = Arrays.asList(myData);
public static void main(String[] args) {
ExecutorService service = Executors.newFixedThreadPool(THREAD_POOL_SIZE);
myList.stream().parallel().forEach(item->{
System.out.println("Processing " + item + " in thread " + Thread.currentThread().getName());
});
}
}
如果你运行它,你会看到同时跨多个线程处理的行:
Processing Line 8 in thread main
Processing Line 4 in thread ForkJoinPool.commonPool-worker-1
Processing Line 9 in thread main
Processing Line 11 in thread ForkJoinPool.commonPool-worker-2
Processing Line 2 in thread ForkJoinPool.commonPool-worker-3
Processing Line 12 in thread ForkJoinPool.commonPool-worker-2
Processing Line 7 in thread main
Processing Line 6 in thread ForkJoinPool.commonPool-worker-1
Processing Line 1 in thread main
Processing Line 10 in thread ForkJoinPool.commonPool-worker-2
Processing Line 3 in thread ForkJoinPool.commonPool-worker-3
Processing Line 5 in thread ForkJoinPool.commonPool-worker-1