使用aws S3中的所有文件,其中deleteAfterRead = false

时间:2015-03-03 13:34:52

标签: java amazon-s3 apache-camel

有没有办法在不从S3中删除文件的情况下使用S3存储桶中的所有文件(在S3中,大约有15,000个文件)?

由于aws-s3中的 noop 参数滞后,以下配置存在问题。而这个问题是:它不断地反复检索相同的5个文件。

    <endpoint id="fbPage" uri="aws-s3://bucket?amazonS3Client=#aws-credential&amp;deleteAfterRead=false&amp;maxMessagesPerPoll=5&amp;prefix=dev/facebook/page"/>

    <route id="consumeS3FbPage">
        <from uri="ref:fbPage"/>
        <choice>
            <when>                  
                <simple>${header.CamelAwsS3ContentLength}  &gt; 0</simple> 
                <log message="Page File detected: ${header.CamelAwsS3Key}"/>
                <bean ref="dfaReportingRePull" method="s3toElasticFormat"/>

                <setHeader headerName="CamelHttpMethod">
                    <constant>POST</constant>
                </setHeader>
                <to uri="http://localhost:9200/fb_camel/page/_bulk"/>
                <log message="Success"/>
            </when>
            <when>
                <simple>${header.CamelAwsS3ContentLength} == 0</simple>
                <log message="Empty content, Probably the s3 key Folder itself: ${header.CamelAwsS3Key}"/>
            </when>
        </choice>               
    </route>

以下日志显示反复检索同一文件:

[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,904 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,993 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:46,994 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,002 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,010 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,011 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,019 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:47,027 INFO  consumeS3FbPage - Success


[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,375 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/05/31/9c9537e6-12a3-415e-aa3d-a450011008be.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,396 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,397 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/01/97d85443-74af-4d64-9808-a4500110117a.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,409 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,410 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/02/223410b2-b4ce-4b7f-8e47-a45001101254.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,419 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,420 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/03/e5c21710-d764-453d-9736-a4500110132e.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,429 INFO  consumeS3FbPage - Success
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,430 INFO  consumeS3FbPage - Page File detected: dev/facebook/page/166866083481130/2014/06/04/851d3759-0c35-4679-838c-a4500110140b.json
[Camel (camel-1) thread #0 - aws-s3://bucket] 21:26:51,439 INFO  consumeS3FbPage - Success

即使我使用Idempotent,它也只是检测到所有5个文件都是重复的,因此被忽略了。

我想知道我是否做了deleteAfterRead,并把它放回去会有效吗?不,当我查看http://camel.465427.n5.nabble.com/camel-aws-s3-get-only-files-I-need-td5714095.html中的代码时,似乎代码只会循环遍历来自aws s3的当前返回列表中的列表。

当我查看代码ListObjectsRequest.java时,我发现有一种方法可以定义一个标记,它指示最后一个处理过的s#键。有没有办法通过Camel Spring DSL设置这个市场? [更新]没有。

在深入挖掘代码之后,我找到了导致这种情况的根本原因。并且可通过此JiRA票证进行追踪:https://issues.apache.org/jira/browse/CAMEL-8431

注意:Camel版本是2.14.0

1 个答案:

答案 0 :(得分:0)

根据Apache Committer Willem Jiang的说法,该修复程序将成为2.14.3版本的一部分。 Camel-8431