我正在尝试删除单个正则表达式中的空白行和无效记录。但这似乎不起作用。在下面的示例中,包含Serverserial:0和ServerName:“”空的记录是无效记录,
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
通过使用以下正则表达式,它仅删除无效的条目,但不会删除迹线(空白)
.*(?<=ServerSerial":")0(?=").*|.*(?<=ServerName":")(?=").*
并且也尝试过此操作,没有运气
.*(?<=ServerSerial":")0(?=").*[\r\n]*|.*(?<=ServerName":")(?=").*[\r\n]*
当前输出类似于空白行
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
但是预期的输出是
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
答案 0 :(得分:0)
方法1:使用1个ReplaceText处理器:
由于我使用了您在问题中提到的一个正则表达式。
将ReplaceText处理器配置为
搜索值
appendChild()
替换价值
(?<=ServerSerial":")0(?=").*[\r\n]*|.*(?<=ServerName":")(?=").*[\r\n]
输入:
${literal("")} //as we are not having any capture groups so i have used empty value for replacing.
输出:
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
方法2:使用QueryRecord处理程序:
如果您知道数据的架构,则可以使用QueryRecord处理器,然后在QueryRecord处理器中添加新属性为
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
然后,处理器输出流文件,其中包含满足上述sql查询的记录。
方法3:串联使用2个ReplaceText处理器:
使用 ReplaceText 处理器进行以下配置:
搜索值
select * from FLOWFILE where ServerName is not null and ServerSerial > 0
替换价值
\n+\s+
字符集
shift+enter
最大缓冲区大小
UTF-8
替换策略
1 MB //needs to change this values as per your flowfile size
评估模式
Regex Replace
我在本地实例中尝试了以下数据
输入流文件内容:
Entire text
输出流文件内容:
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
请参考this链接,以替换流文件中的空行。
答案 1 :(得分:0)
将此附加到第二个正则表达式:
(?<=[\r\n])[\r\n]|
通过删除换行符后再换一行来删除空行。
答案 2 :(得分:0)
当您将文件转换为UNIX文件时,可以使用
grep -Ev 'ServerSerial":"0?"|ServerName":"0?"' inputfile
答案 3 :(得分:0)
您可以通过以下方式忽略这些空白行。
使用ReplaceText处理器。
Search: \n\n\s|\n\s
Replace: \n
参考:How to use regex to remove the spaces between two rows?
如果您遇到任何问题,请告诉我。
答案 4 :(得分:0)
如果您的所有记录都是基于行的,则可以使用Perl解决。使用Perl 单线解决方案,我们可以将十六进制\ x22用作双引号。请参阅以下内容是否适合您。我还为您的输入添加了空行。
>cat regex_event.dat
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"0","ServerName":"","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
>
>perl -ne ' s/^\s*$//g; print if length($_) > 0 and not m/\x22ServerSerial\x22:\x220\x22,\x22ServerName\x22:\x22\x22/' regex_event.dat
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"XYZ_P_O","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691950","ServerName":"ABC_P_1","deletedat":"2018-08-24 15:30:48.136"},
{"eventType":"delete","ServerSerial":"1142691750","ServerName":"COL_P_1","deletedat":"2018-08-24 15:30:48.136"}
>
答案 5 :(得分:0)
从NiFi 1.7.0开始(通过NIFI-4456),您可以将JsonTreeReader配置为在格式输入时读取“每行一个JSON”。然后,您可以使用QueryRecord发出SQL查询来路由记录但是,例如,您喜欢查询SELECT * FROM FLOWFILE WHERE ServerSerial = 0 AND ServerName = ""
的“无效”属性和查询SELECT * FROM FLOWFILE WHERE ServerSerial <> 0 OR ServerName <> ""
的“有效”属性,等等。