字符串使用逗号分隔,并使用正则表达式忽略双引号中的逗号

时间:2017-06-22 07:20:22

标签: regex split apache-nifi

我正在尝试使用正则表达式拆分字符串。我需要在nifi中使用正则表达式将字符串拆分成组。任何人都可以帮助我如何使用正则表达式分割下面的字符串。

我有一个这样的字符串:

"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"


我想用逗号分隔,但我需要在引号中忽略逗号。我想要的结果是这样的:

    group 1 - abc
    group 2 - -9223371901096288826
    group 3 - /home/test/20170614
    group 4 - abc.com
    group 5 - Hello,Test
    group 6 - 7462200
    group 7 - 4622012
    group 8 - 1296614
    group 9 - 1029293
    group 10 - 893529
    group 11 - a:ce:o:5:l:p:MMM dd HH:mm:ss
    group 12 - Logs
    group 13 - UTF8
    group 14 - <111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -


我尝试了很多正则表达式来分裂但无法获得正确的结果。

我尝试了从this link找到的,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)正则表达式。

以上正则表达式在Java中对split()函数很有用,但我不想在Java中使用。

我尝试了(?<=\")([^,]*)(?=\")正则表达式,并用逗号分组字符串,但它也分成双引号。

任何人都可以帮助我。在此先感谢。

1 个答案:

答案 0 :(得分:4)

您可以通过以下方式获取您的要求,而无需捕获群组。

让我们考虑你的下面的字符串。,

1.在名为 "InputString" 的属性中使用UpdateAttribute存储整个String。

"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"

2.在updateAttribute的结果之后,您可以使用另一个更新属性来提取这些值,如下所示..,

group1:${InputString:getDelimitedField(1)}
group2:${InputString:getDelimitedField(2)}
group3:${InputString:getDelimitedField(3)}
group4:${InputString:getDelimitedField(4)}
group5:${InputString:getDelimitedField(5)}
group6:${InputString:getDelimitedField(6)}
group7:${InputString:getDelimitedField(7)}
group8:${InputString:getDelimitedField(8)}
group9:${InputString:getDelimitedField(9)}
group10:${InputString:getDelimitedField(10)}
group11:${InputString:getDelimitedField(11)}
group12:${InputString:getDelimitedField(12)}
group13:${InputString:getDelimitedField(13)}

您可以使用getDelimitedFunction是使用以下参考

提取这些值的最简单方法

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#getdelimitedfield

如果您遇到任何问题,请与我联系。