Java拆分分隔符,换行符和回车符

时间:2018-06-13 13:24:03

标签: java regex apache-camel

我使用Apache Camel分割带有正则表达式的文件流。我需要保留delimeters,所以我使用以下正则表达式:

(?=\r\n[0-9]{2}00)用于回车的Windows换行符。

但是我想支持Windows和Linux换行符,所以我认为这样的事情应该有效:

(?=\r?\n[0-9]{2}00)甚至(?=(\r\n|\n)[0-9]{2}00)

但由于某种原因,执行上面的任何选项都会给我一些带有空值的输出,因此当我需要运行一些子串等时它会中断。

我还测试了其他选项,例如:

(?=.\n[0-9]{2}00) (?=^[0-9]{2}00)(我保留了上一组的换行符)

但是对于其中任何一个,我在开头或结尾都得到一个空输出。

编辑:添加了驼峰路线

<route id="FileRoute" trace="false">
    <from uri="file:{{file.path}}?move=.done&amp;readLock=changed&amp;readLockCheckInterval=1500&amp;charset=UTF-8"/>

    <split streaming="true" stopOnException="true" shareUnitOfWork="true" parallelProcessing="true">
        <tokenize token="(?=\R[0-9]{2}00)" regex="true" />

        <setHeader headerName="CamelSplitIndex">
            <!-- line number basically -->
            <simple>${property.CamelSplitIndex}</simple>
        </setHeader>
        <setHeader headerName="CamelSplitComplete">
            <!-- did I read the last line? -->
            <simple>${property.CamelSplitComplete}</simple>
        </setHeader>

        <log message="Line: ${property.CamelSplitIndex}: BODY: ${body}"/>

        <choice>
            <when>
                <simple>${property.CamelSplitIndex} == 0</simple>
                <!-- skip first line as its a header -->
            </when>
            <otherwise>

                <setProperty propertyName="StringFound">
                    <simple>${body.substring(5,24).trim()}</simple>
                </setProperty>

                <setHeader headerName="ModifiedStringFound">
                    <simple>${property.StringFound}</simple>
                </setHeader>

                <process ref="ProcessName"/>

                <transform>
                    <simple>${body.replace(${property.StringFound}, ${header.ModifiedStringFound})}</simple>
                </transform>

                <removeHeader headerName="ModifiedStringFound"/>

            </otherwise>
        </choice>

        <aggregate strategyRef="AggregationStrategy" completionTimeout="15000">
            <correlationExpression>
                <simple>${in.header.CamelFileName}</simple>
            </correlationExpression>
            <completionTimeout>
                <header>timeout</header>
            </completionTimeout>
            <to uri="log:com.blah.blah.out"/>
        </aggregate>

    </split>

</route>

错误:

  

org.apache.camel.language.bean.RuntimeBeanExpressionException:   无法调用方法:.substring(5,24).trim()在null上由于:   org.apache.camel.language.bean.RuntimeBeanExpressionException:失败   调用方法:null的substring(5,24)由于:   java.lang.StringIndexOutOfBoundsException:字符串索引超出范围:   24

文件示例:

78643756435694369    4754757864254578754578545457                                                                                                                0071
05007684546545465745     1740266981415800014580631000874120180521185558     000000000247986DFGBDFH FDGDGJHUHJK   SDFGSGDFGf      GT 541100898  00710047503051 0220180522
0501            000000  000000000000                                     046    00000103971056242218759000000000000 000000000000 000000                    00000000000  
0502            GH    001000000000000000000                                                                                                                             
05005212455451257521     1740266981415800001820031000874120180521183349     000000001817986FGHDFHFGJFGDHGDFDFH        FDGFDHGFDHDFH 541100898  00710043090051 0220180522
0501            000000  000000000000                                     046    00000100293449142130526000000000000 000000000000 000000                    00000000000  
0502            FD    001000000000000000000                                                                                                                             
05009789265762578888     1740266981415800012612361003716920180521173412     000000004859986DFHDGJFGJFGJKHGJGHJ   GDHFGHFGHFGH 541100898  00710029706451 0220180522
0501            000000  000000000000                                     046    00000103058175142271046000000000000 000000000000 000000                    00000000000  
0502            AR    001000000000000000000                                                                                                                             
05008758407825904958     1740266981415800004933011003716920180521173559     000000000798986FGHGFGHRTUJHGJDGHYHTJK    DHDGJFHJHFHJ    NJ 541100898  00710030461251 0220180522
0501            000000  000000000000                                     046    00000100902124678647109000000000000 000000000000 000000                    00000000000  
0502            TY    001000000000000000000                                                                                                                             
05004987785686893465     1740266981415800003253131003716920180521174415     000000001142986FDGFDGHFGJTYUJDFHGDEHGTGFH     DFHDYRT     BG 541100898  00710032033851 0220180522
0501            000000  000000000000                                     046    00000100620274678526079000000000000 000000000000 000000                    00000000000  
0502            UI    001000000000000000000                                                                                                                             

知道如何解决这个问题吗?

0 个答案:

没有答案