我有一个文本文件(file.txt
),其中包含从外部来源收到的一堆结果(没有换行符,空格等)。从该文件中,我需要找到单词serId
的所有提及,然后在其后打印出字母数字序列。字母数字序列可以是任何长度,但以字符,
结尾。如何提取这些字母数字序列?
我尝试使用sed / awk查找脚本/代码,但结果似乎围绕着要查找的已知序列,而不是未知序列。
例如,我想从以下示例文本中提取28655784-EE
:
{"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU
答案 0 :(得分:2)
尝试以下awk脚本(仅适用于gawk):
awk -F '","' 'match($0,/serId\":\"[^,]*/,m){print m[1]}' input.txt
如果您需要终止的,
awk -F '","' 'match($0,/serId\":\"[^,]*/,m){print m[1]","}' input.txt
说明:
-F ","
将文件解析为由,
分隔的记录
match($0,"serId[^,]*",m)
筛选当前记录,匹配以serId
开头并以,
终止的字符串。将结果放入数组m
print substr(m[0],8)
从第8个位置打印匹配的字符串
答案 1 :(得分:1)
grep -o
是一个非常简单的解决方案:
我已经创建了一个文件,其中包含以下几行:
serId12345
serIdABCde123;
Ser_idblabla;
第一行不以分号结尾,第三行以错误的单词开头,因此只有第二行是正确的。
我启动了以下命令:grep -o "serId[0-9a-zA-Z]*;" testtttt.txt
,结果如下:
serIdABCde123;
答案 2 :(得分:0)
根据您在评论中发布的简短示例,我有两个建议:
如果格式不正确且只能解释为一堆文本,请使用以下Perl:
perl -lne '@m=/"serId":"([^"]+)"/g; print "@m"' file.txt
试运行:
$ cat file.txt
{"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU{"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU
$ perl -lne '@m=/"serId":"([^"]+)"/g; print "@m"' file.txt
28655784-EE 28655784-EE
答案 3 :(得分:0)
使用任何sed:
$ sed 's/.*"serId":"\([^"]*\).*/\1/' file
28655784-EE