grep返回每行的字符串变体的多个匹配项

时间:2014-04-16 12:46:03

标签: bash awk grep

我有一个包含数据库序列名称的文件

他们有以下两种形式

@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")

我想要返回MY1_SEQ和MY2_SEQ

如果我使用grep for _SEQ那么我得到整行

我曾尝试使用awk

grep SEQ * | awk '{print $7}'

但这并不能解决每一行可能略有不同的事实。

我想返回与_SEQ

匹配的整个单词(用空格分隔)

我该怎么做?

5 个答案:

答案 0 :(得分:3)

您只需稍微调整一下grep模式,然后使用-o仅返回 匹配的部分:

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ(UENCE)?'
My1_SEQUENCE
MY1_SEQ
My2_SEQUENCE
MY2_SEQ

或者你只想要第二个:

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ'
MY1_SEQ
MY2_SEQ

或者更一般地说,如果你想要xxx_SEQ

$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o '[^ "]+_SEQ\b'
MY1_SEQ
MY2_SEQ

答案 1 :(得分:2)

grep -Po '(?<=sequenceName = ")[^"]*' filename

答案 2 :(得分:0)

如果你使用ack(http://beyondgrep.com),你可以这样做:

ack 'MY\d_SEQ.+' -w -o filename

答案 3 :(得分:0)

如果您总是想要最后一个字段,那么awk会为您提供一个名为NF的变量,可用于检索最后一个值。

$ awk '{gsub(/[")]/,"",$NF);print $NF}' file
MY1_SEQ
MY2_SEQ

使用gsub我们删除了引号和parens。

答案 4 :(得分:0)

awk  '{match($0, /MY.*_SEQ/,arr); print arr[0]}' input.txt

输入:

@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")

输出:

MY1_SEQ
MY2_SEQ