我有以下文件
cat file.txt
ID Location
MNS1 NC_000004.12:g.d.a144120555T>C;NC_001423.23:c.a144120513G<C
MNS2 NC_000142.12:g.a144120552C,N>D
MNS3 NC_000142.12:g.a144120559C>N
我想用这种方式替换输入:
ID Location
MNS1 NC_000004.12:144120555;NC_001423.23:144120513
MNS2 NC_000142.12:144120552
MNS3 NC_000142.12:144120559
我希望删除除:
和;
之间的数字以外的所有内容
例如,我尝试过:
echo "NC_000004.12:g.d.a144120555T>C;" | sed 's/:[^0-9]*/:/g; s/[^0-9]*;/;/g; s/[^0-9]*$//g'
期望的输出
NC_000004.12:144120555
答案 0 :(得分:2)
这可能会成功!
sed -i.bak 's/g\.//g; s/\w>\w//g' filename
对于(NC.*?):
concat,关于终端输出的位说明将有所帮助,尽管这可能会起作用:
s/NC[0-9]?:/:/
答案 1 :(得分:1)
如果您可以选择Perl:
cat file.txt
ID Location
MNS1 NC_000004.12:g.d.a144120555T>C;NC_001423.23:c.a144120513G<C
MNS2 NC_000142.12:g.a144120552C,N>D
MNS3 NC_000142.12:g.a144120559C>N
perl -ape 's/:\D+(\d+).*?(?=;|$)/:$1/g' file.txt
ID Location
MNS1 NC_000004.12:144120555;NC_001423.23:144120513
MNS2 NC_000142.12:144120552
MNS3 NC_000142.12:144120559
说明:
s/ # substitute
: # colon
\D+ # 1 or more non digits
(\d+) # group 1,, 1 or more digit
.*? # 0 or more any character but bewline, not greedy
(?=;|$) # positive lookahead, make sure we have semi-colon or end of line
/ # with
: # colon
$1 # content of group 1 (i.e. the digits)
/g # end, global