在Bash中使用正则表达式拆分大块文本

时间:2018-08-08 13:36:13

标签: regex bash logging split

我有很多要分割的文字。这很困难,因为从技术上讲都是一条线。文本是来自网络设备的未格式化的已记录消息-告诉一条消息在何处结束的唯一方法是,消息始终以'.{5}\d{7}'开头,例如<186>1093281。如何读取该字符串,并保存在名为“ textLog”的文件中,并根据该正则表达式将其拆分以形成新的字符串/数组以进行干净输出?

示例输入:

<189>795307: Aug  8 11:41:38 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted<189>795308: Aug  8 11:41:39 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed<189>795309: Aug  8 11:41:45 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted<189>795310: Aug  8 11:41:46 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed<189>795311: Aug  8 11:41:52 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted<189>795312: Aug  8 11:41:53 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed<189>795313: Aug  8 11:41:59 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed<189>795314: Aug  8 11:42:05 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted

(其格式为一个长字符串,而不是多行。)

所需的输出:包含...的数组

arr[0]=<189>795307: Aug  8 11:41:38 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted
arr[1]=<189>795308: Aug  8 11:41:39 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed
arr[2]=<189>795309: Aug  8 11:41:45 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted
...
arr[7]=<189>795314: Aug  8 11:42:05 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted 

它不必是数组或存储在数据结构中,我最关心的是基于正则表达式进行拆分的方法,以输出或保存子字符串。

1 个答案:

答案 0 :(得分:1)

使用GNU sed和Bash 4.0或更高版本:

--upgrade

sed命令查找前6个字符的6位数字(而不是7中暗示的数字)的任何块,并在第一个字符后插入换行符。这不包括在行的开头匹配的字符串,在此我们不想引入换行符。

$ mapfile -t arr < <(sed -E 's/(.)(.{5}[[:digit:]]{6})/\1\n\2/g' infile) $ printf '%s\n' "${arr[@]}" <189>795307: Aug 8 11:41:38 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted <189>795308: Aug 8 11:41:39 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed <189>795309: Aug 8 11:41:45 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted <189>795310: Aug 8 11:41:46 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed <189>795311: Aug 8 11:41:52 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted <189>795312: Aug 8 11:41:53 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed <189>795313: Aug 8 11:41:59 EDT: %ILPOWER-5-IEEE_DISCONNECT: Interface Gi1/0/8: PD removed <189>795314: Aug 8 11:42:05 EDT: %ILPOWER-5-POWER_GRANTED: Interface Gi1/0/8: Power granted 然后通过过程替换将结果读入数组mapfile中。 arr语句每行显示一个数组元素。

或者,根据示例输入,您可以使用grep如下分成几行:

printf

这假设每次出现grep -o '<[^<]*' infile 都标志着一个新的日志行。