我有一个包含数据库序列名称的文件
他们有以下两种形式
@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
我想要返回MY1_SEQ和MY2_SEQ
如果我使用grep for _SEQ那么我得到整行
我曾尝试使用awk
grep SEQ * | awk '{print $7}'
但这并不能解决每一行可能略有不同的事实。
我想返回与_SEQ
匹配的整个单词(用空格分隔)我该怎么做?
答案 0 :(得分:3)
您只需稍微调整一下grep模式,然后使用-o
仅返回 匹配的部分:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ(UENCE)?'
My1_SEQUENCE
MY1_SEQ
My2_SEQUENCE
MY2_SEQ
或者你只想要第二个:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o 'M.._SEQ'
MY1_SEQ
MY2_SEQ
或者更一般地说,如果你想要xxx_SEQ
:
$ echo '@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")' \
| egrep -o '[^ "]+_SEQ\b'
MY1_SEQ
MY2_SEQ
答案 1 :(得分:2)
grep -Po '(?<=sequenceName = ")[^"]*' filename
答案 2 :(得分:0)
如果你使用ack(http://beyondgrep.com),你可以这样做:
ack 'MY\d_SEQ.+' -w -o filename
答案 3 :(得分:0)
如果您总是想要最后一个字段,那么awk
会为您提供一个名为NF
的变量,可用于检索最后一个值。
$ awk '{gsub(/[")]/,"",$NF);print $NF}' file
MY1_SEQ
MY2_SEQ
使用gsub
我们删除了引号和parens。
答案 4 :(得分:0)
awk '{match($0, /MY.*_SEQ/,arr); print arr[0]}' input.txt
输入:
@SequenceGenerator(allocationSize=1, name = "My1_SEQUENCE", sequenceName = "MY1_SEQ")
@SequenceGenerator(name = "My2_SEQUENCE", sequenceName = "MY2_SEQ")
输出:
MY1_SEQ
MY2_SEQ