如何在UNIX中提取关键字后面的文本

时间:2019-05-29 09:54:41

标签: unix text awk sed text-extraction

我有一个文本文件(file.txt),其中包含从外部来源收到的一堆结果(没有换行符,空格等)。从该文件中,我需要找到单词serId的所有提及,然后在其后打印出字母数字序列。字母数字序列可以是任何长度,但以字符,结尾。如何提取这些字母数字序列?

我尝试使用sed / awk查找脚本/代码,但结果似乎围绕着要查找的已知序列,而不是未知序列。

例如,我想从以下示例文本中提取28655784-EE

{"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU

4 个答案:

答案 0 :(得分:2)

尝试以下awk脚本(仅适用于gawk):

awk  -F '","' 'match($0,/serId\":\"[^,]*/,m){print m[1]}' input.txt

如果您需要终止的,

awk  -F '","' 'match($0,/serId\":\"[^,]*/,m){print m[1]","}' input.txt

说明:

-F ","将文件解析为由,分隔的记录

match($0,"serId[^,]*",m)筛选当前记录,匹配​​以serId开头并以,终止的字符串。将结果放入数组m

print substr(m[0],8)从第8个位置打印匹配的字符串

答案 1 :(得分:1)

grep -o是一个非常简单的解决方案:

我已经创建了一个文件,其中包含以下几行:

serId12345
serIdABCde123;
Ser_idblabla;

第一行不以分号结尾,第三行以错误的单词开头,因此只有第二行是正确的。

我启动了以下命令:grep -o "serId[0-9a-zA-Z]*;" testtttt.txt,结果如下:

serIdABCde123;

答案 2 :(得分:0)

根据您在评论中发布的简短示例,我有两个建议:

  • 如果文件是格式正确的json,请尝试了解其结构并使用jq

  • 如果格式不正确且只能解释为一堆文本,请使用以下Perl:

    perl -lne '@m=/"serId":"([^"]+)"/g; print "@m"' file.txt
    

    试运行:

    $ cat file.txt
    {"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU{"preRollbackCheckResults":[],"patchingHistory":[{"backupStatus":"Available","rollbackStatus":"Available","additionalNote":"Patching CDS as planned","appliedBy":"xxrbsgCDS02services","appliedDate":"2019-01-18T12:45:33.926+0000","totalTime":"29 min, 47 sec","serId":"28655784-EE","patchDescription":"DB 18.4.0.0.0 Oct 2018 PSU
    
    $ perl -lne '@m=/"serId":"([^"]+)"/g; print "@m"' file.txt
    28655784-EE 28655784-EE
    

答案 3 :(得分:0)

使用任何sed:

$ sed 's/.*"serId":"\([^"]*\).*/\1/' file
28655784-EE