我正在尝试使用正则表达式拆分字符串。我需要在nifi中使用正则表达式将字符串拆分成组。任何人都可以帮助我如何使用正则表达式分割下面的字符串。
我有一个这样的字符串:
"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"
我想用逗号分隔,但我需要在引号中忽略逗号。我想要的结果是这样的:
group 1 - abc
group 2 - -9223371901096288826
group 3 - /home/test/20170614
group 4 - abc.com
group 5 - Hello,Test
group 6 - 7462200
group 7 - 4622012
group 8 - 1296614
group 9 - 1029293
group 10 - 893529
group 11 - a:ce:o:5:l:p:MMM dd HH:mm:ss
group 12 - Logs
group 13 - UTF8
group 14 - <111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -
我尝试了很多正则表达式来分裂但无法获得正确的结果。
我尝试了从this link找到的,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)
正则表达式。
以上正则表达式在Java中对split()函数很有用,但我不想在Java中使用。
我尝试了(?<=\")([^,]*)(?=\")
正则表达式,并用逗号分组字符串,但它也分成双引号。
任何人都可以帮助我。在此先感谢。
答案 0 :(得分:4)
您可以通过以下方式获取您的要求,而无需捕获群组。
让我们考虑你的下面的字符串。,
1.在名为 "InputString"
的属性中使用UpdateAttribute存储整个String。
"abc","-9223371901096288826","/home/test/20170614","abc.com","Hello,Test","7462200","4622012","1296614","1029293","893529","a:ce:o:5:l:p:MMM dd HH:mm:ss","Logs","UTF8","<111>Jun 14 12:43:20 logs: Info: 1497462198.717 13073 1.22.333.44 TCP/200 168 TCP_CONNECT 1.22.33.44:443 ""GO\ABC.COM"" DIRECT/img.abc.com - test_abc_7-DefaultGroup-DefaultGroup-NONE-NONE-NONE-DefaultGroup <IW_adv,3.9,-,""-"",-,-,-,-,""-"",-,-,-,""-"",-,-,""-"",""-"",-,-,IW_adv,-,""-"",""-"",""Unknown"",""Unknown"",""-"",""-"",0.10,0,-,""-"",""-"",-,""-"",-,-,""-"",""-"",-,-,""-""> - -"
2.在updateAttribute的结果之后,您可以使用另一个更新属性来提取这些值,如下所示..,
group1:${InputString:getDelimitedField(1)}
group2:${InputString:getDelimitedField(2)}
group3:${InputString:getDelimitedField(3)}
group4:${InputString:getDelimitedField(4)}
group5:${InputString:getDelimitedField(5)}
group6:${InputString:getDelimitedField(6)}
group7:${InputString:getDelimitedField(7)}
group8:${InputString:getDelimitedField(8)}
group9:${InputString:getDelimitedField(9)}
group10:${InputString:getDelimitedField(10)}
group11:${InputString:getDelimitedField(11)}
group12:${InputString:getDelimitedField(12)}
group13:${InputString:getDelimitedField(13)}
您可以使用getDelimitedFunction是使用以下参考
提取这些值的最简单方法https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#getdelimitedfield
如果您遇到任何问题,请与我联系。