我的日志文件中有几行,例如:
www_liferay.log.2016-04-06-09:09:28:11|88.196.217.216+Su3XAR2l+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953115 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-09:09:28:23|88.196.217.216+8vNPzjWX+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953119 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-09:09:36:08|88.196.217.216+09ROHqBk+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953119 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-10:10:14:50|62.65.33.194+cIvtH8Ju+LR2132626|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=12229566 action=MultiSimPackageChangeController.addContract|
所以,我需要构造grep / awk / sed命令来获得这个输出:
09:28:11|88.196.217.216 LR15563413 susgRefNum=10953115
所以我应该使用类似的东西?
first column (09:28:11|88.196.217.216) data between patterns ":" and "+"
second column (LR15563413) data between patterns "+" and "|" and
third column (susgRefNum=10953115) between spaces
时间戳和IP地址可以更改,LRxxxxx数字也可以更改,因此不是常量。
要获得第一列,我就这样使用它:
awk -F: '{print $2 ":" $3 ":" $4}' testfile.txt | head -1 | awk -F+ '{print $1}' | head -1
得到类似的东西:
09:28:11|88.196.217.216
如果可以,请向我解释您使用的标志/选项。
谢谢!
答案 0 :(得分:1)
试试这个测试版本:
sed -n '{ /^.*:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][|][0-9.]*[+].*[+]LR[0-9]*[|].*susgRefNum=[0-9]*[^0-9].*$/ { s/^.*:\([0-9][0-9]:[0-9][0-9]:[0-9][0-9][|][0-9.]*\)[+].*[+]\(LR[0-9]*\)[|].*\(susgRefNum=[0-9]*\)[^0-9].*$/\1 \2 \3/ p } }'
-n 选项指示 sed 不打印行,除非使用 p 。
sed 逐行读取并使用正则表达式。请阅读着名的页面:
第一个正则表达式选择与您的模式匹配的行:
^.*:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][|][0-9.]*[+]
它应该以字符开头,然后是:HH:MM:SS | IP +
.*[+]LR[0-9]*[|]
它应该继续任何字符序列,直到+ LRnnnnn | (n是数字,regexp中的[0-9])
.*susgRefNum=[0-9]*[^0-9]
它应该继续任何字符序列,直到susgRefNum = nnnnnn
.*$
然后它可以以任何字符序列结束
如果读取(存储在模式缓冲区中)的临界线匹配,则 s 命令(搜索和替换)用于修改模式缓冲区并删除所有不需要的字符序列。
s 命令:
s/regexp/replacement/flags
\(和 \)用于选择与 s 一起使用的正则表达式中的特定序列。可以使用 \ 1 在替换部分中引用此特定序列。如果选择了多个序列,则可以使用 \ 1 \ 2 等。
最后,生成的模式缓冲区使用 p 打印。
不打印任何不匹配的行。
测试如下:
$ cat myfile.log
www_liferay.log.2016-04-06-09:09:28:11|88.196.217.216+Su3XAR2l+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953115 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-09:09:28:23|88.196.217.216+8vNPzjWX+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953119 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-09:09:36:08|88.196.217.216+09ROHqBk+LR15563413|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=10953119 action=MultiSimPackageChangeController.addContract|
www_liferay.log.2016-04-06-10:10:14:50|62.65.33.194+cIvtH8Ju+LR2132626|INFO |e.e.p.i.RequestLogInterceptor|render MultiSimPackageChangeController(POST) susgRefNum=12229566 action=MultiSimPackageChangeController.addContract|
another line / another format
$ sed -n '{ /^.*:[0-9][0-9]:[0-9][0-9]:[0-9][0-9][|][0-9.]*[+].*[+]LR[0-9]*[|].*susgRefNum=[0-9]*[^0-9].*$/ { s/^.*:\([0-9][0-9]:[0-9][0-9]:[0-9][0-9][|][0-9.]*\)[+].*[+]\(LR[0-9]*\)[|].*\(susgRefNum=[0-9]*\)[^0-9].*$/\1 \2 \3/ p } }' myfile.log
09:28:11|88.196.217.216 LR15563413 susgRefNum=10953115
09:28:23|88.196.217.216 LR15563413 susgRefNum=10953119
09:36:08|88.196.217.216 LR15563413 susgRefNum=10953119
10:14:50|62.65.33.194 LR2132626 susgRefNum=12229566
答案 1 :(得分:1)
sed -r 's/^[^:]*:([^+]*).*\+([^|]*).*(susgRefNum=[^ ]*).*/\1 \2 \3/g' file
^[^:]*:
- >匹配直到第一次出现:
([^+]*)
- >获取字符串直到找到下一个+
,例如09:28:11|88.196.217.216
.*\+
- >将字符串匹配到下一个+
([^|]*)
- >获取字符串|
,例如LR15563413
.*(
- >匹配直到找到下一个)
(susgRefNum=[^ ]*)
- >获取字符串,直到下一个空格为例如(susgRefNum=[^ ]*)
\1 \2 \3
- >打印我们在()
答案 2 :(得分:0)
awk有很多字符串操作函数,你可以使用它们:
awk -F'|' '{
lr=substr($2, 1, index($2,"+")-1); sub(/.*\+/, "", $2);
sub(/ [^ ]*$/, "", $5); sub(/.* /,"",$5);
printf "%s|%s %s %s\n",substr($1, index($1,":")+1),lr,$2,$5;
}' file
答案 3 :(得分:0)
awk
救援!
$ awk -F'[|+]| +' '{sub(/[^:]*:/,"",$1); print $1"|"$2,$4,$10}' file
09:28:11|88.196.217.216 LR15563413 susgRefNum=10953115
09:28:23|88.196.217.216 LR15563413 susgRefNum=10953119
09:36:08|88.196.217.216 LR15563413 susgRefNum=10953119
10:14:50|62.65.33.194 LR2132626 susgRefNum=12229566
<强>解释强>
-F'[|+]| +'
将字段分隔符设置为|
+
并将您与空格隔开 可以摆脱大多数人工分裂。感谢FS,
sub(/[^:]*:/,"",$1)
从日期:时间格式中删除日期 第1场
print $1"|"$2,$4,$10
打印所选字段 默认分隔符是第一个和第二个字段之间的空间 设为|
。