我想提取gi|
和|
之间的所有字符串。字符串的位置在所有行中都是一致的。
我正在尝试这个:
cat ERR594382_second_cat.test | sed -n '/gi\|/,/\|/p'
但是,它没有用。
这是我文件的负责人:
head ERR594382_second_cat.test
ERR594382.28316455_3_6_1 gi|914605561|ref|WP_050599988.1| 22 54 67 99 4.03e-15 77.0 100.000 33 0 0 225971;1306953 Bacteria Erythrobacter citreus;Erythrobacter citreus LAMA 915 ribonuclease D [Erythrobacter citreus]
ERR594382.28316455_65_2_3 gi|914605561|ref|WP_050599988.1| 13 46 11 44 2.15e-17 82.8 100.000 34 0 0 225971;1306953 Bacteria Erythrobacter citreus;Erythrobacter citreus LAMA 915 ribonuclease D [Erythrobacter citreus]
ERR594382.28316459_1_1_2 gi|1270336953|gb|PHR32068.1| 8 53 863 903 6.98e-08 56.6 63.043 46 12 1 2024840 Bacteria Methylophaga sp. phosphohydrolase [Methylophaga sp.]
ERR594382.28316464_2_2_3 gi|705244733|gb|AIW56710.1| 2 33 145 176 5.76e-12 67.8 93.750 32 2 0 340016 Viruses uncultured virus ribonucleotide reductase, partial [uncultured virus]
ERR594382.28316464_53_5_5 gi|1200458341|gb|OUV73944.1| 1 31 557 587 9.54e-11 64.3 80.645 31 6 0 1986721 Bacteria Flavobacteriales bacterium TMED123 hypothetical protein CBC83_04720 [Flavobacteriales bacterium TMED123]
ERR594382.28316465_3_3_2 gi|787065740|dbj|BAR36435.1| 1 46 204 249 5.55e-10 63.2 58.696 46 19 0 1407671 Viruses uncultured Mediterranean phage uvMED hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316465_67_4_3 gi|787065740|dbj|BAR36435.1| 2 34 224 256 1.31e-07 55.1 66.667 33 11 0 1407671 Viruses uncultured Mediterranean phage uvMED hypothetical protein [uncultured Mediterranean phage uvMED]
ERR594382.28316466_18_6_3 gi|1200295886|gb|OUU17830.1| 1 33 92 124 1.73e-12 70.1 100.000 33 0 0 1986638 Bacteria Alphaproteobacteria bacterium TMED37 hypothetical protein CBB97_21775 [Candidatus Endolissoclinum sp. TMED37]
ERR594382.28316470_37_1_1 gi|787067413|dbj|BAR37857.1| 16 43 60 87 1.94e-09 58.9 96.429 28 1 0 1407671 Viruses uncultured Mediterranean phage uvMED terminase large subunit [uncultured Mediterranean phage uvMED]
ERR594382.28316474_2_5_1 gi|1219813777|gb|ASN63501.1| 1 33 62 94 3.55e-12 64.3 81.818 33 6 0 340016 Viruses uncultured
答案 0 :(得分:0)
您可以使用dbms.windows_service_name
或/ grep
(如果使用macOS):
pcregrep
或与:
pcregrep -o "gi\|\K.+?(?=\|)" file
grep -oP "gi\|\K.+?(?=\|)" file
可以理解为排除前面左边的所有内容,只返回右边的\K
,然后.+
匹配任何字符,直到.+?(?=\|)
为止找到。
如果只修改了分隔符,最简单的方法可能是|
:
cut