我有一个问题,
我有一个名为variants.txt的文件,里面有这个文字:
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42132048_42132049insT";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42132048_42132049insTT";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42132048_42132049delTT";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131884_42131885insT";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131540_42131541delTC";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131420T>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131222G>A";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131145T>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131125C>G";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131122A>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131119G>A";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131118T>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131112G>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131111T>C";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131067G>A";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131066G>A";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131063G>A";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131059C>T";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131058C>G";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131023C>G";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131016T>C";
select chrom,chromStart,chromEnd,name from snp147 where name="rs138100349 ";
select chrom,chromStart,chromEnd,name from snp147 where name="rs118203758 ";
如果最后一列(以name=
开头)包含子串g.
,我想匹配,如果是,则在g.
和尾随";
之间打印所有内容另一个文件。
例如:
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42132048_42132049insT";
select chrom,chromStart,chromEnd,name from snp147 where name="NC_000022.11:g.42131125C>G";
我想:
42132048_42132049insT
42131125C>G
我该怎么做?
答案 0 :(得分:2)
尝试:
awk '{num=sub(/.*:g\./,"");num+=sub(/\".*/,"");if(num==2){print};num=""}' Input_file
答案 1 :(得分:1)
仔细选择输入字段分隔符regex(通过-F
)可以得到一个简单的解决方案:
awk -F':g\.|";' 'NF>2 {print $2}' file
正则表达式(正则表达式):g\.|";
将每个输入行按文字:g.
或(|
)文字";
拆分为字段,将感兴趣的行拆分为(至少) 3 字段,其中提取的子字符串包含在 2nd 字段($2
)中。
NF>2
仅匹配至少包含3个字段的行(NF
是字段数),这可确保忽略不包含感兴趣子字符串的行。
答案 2 :(得分:0)
您可以使用awk
,grep
和sed
执行此操作:
awk -F'name=' '{print $2}' variants.txt | awk -F'g.' '{print $2}' | sed -e 's/";//g'
那是:
将原始文件中的字符串从“name =”收集到最后。
只获取字符串为“g”的行。
取自“g”。到最后
删除最后的“和;字符以获取示例中提到的输出。