我在“KAAS - KEGG自动注释服务器”上启动了一个带有氨基酸序列的查询。
然后我下载了名为“myfile.keg”的结果文件。可以在以下位置下载一个显示其外观的小示例文件:https://www.dropbox.com/s/ixf0091z5q3cx9z/myfile.keg?dl=0
+D KO
#<h2><a href="/kegg/kegg2.html"><img src="/Fig/bget/kegg3.gif" align="middle" border=0></a> KEGG Orthology (KO)</h2> 75prot_protdiff_GD_5h
!
A<b>Metabolism</b>
B
B <b>Carbohydrate metabolism</b>
C 00010 Glycolysis / Gluconeogenesis [PATH:ko00010]
D MYGENEACCESSION01; K01623 ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C 00020 Citrate cycle (TCA cycle) [PATH:ko00020]
C 00030 Pentose phosphate pathway [PATH:ko00030]
D MYGENEACCESSION02; K01623 ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C 00040 Pentose and glucuronate interconversions [PATH:ko00040]
C 00051 Fructose and mannose metabolism [PATH:ko00051]
D MYGENEACCESSION03; K17497 PMM; phosphomannomutase [EC:5.4.2.8]
D MYGENEACCESSION04; K01623 ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C 00052 Galactose metabolism [PATH:ko00052]
C 00053 Ascorbate and aldarate metabolism [PATH:ko00053]
C 00500 Starch and sucrose metabolism [PATH:ko00500]
C 00520 Amino sugar and nucleotide sugar metabolism [PATH:ko00520]
D MYGENEACCESSION05; K01183 E3.2.1.14; chitinase [EC:3.2.1.14]
C 00620 Pyruvate metabolism [PATH:ko00620]
C 00630 Glyoxylate and dicarboxylate metabolism [PATH:ko00630]
C 00640 Propanoate metabolism [PATH:ko00640]
C 00650 Butanoate metabolism [PATH:ko00650]
C 00660 C5-Branched dibasic acid metabolism [PATH:ko00660]
C 00562 Inositol phosphate metabolism [PATH:ko00562]
B
!
#<hr>
#<b>[ <a href="/kegg/ko.html">KO</a> | <a href="/kegg/brite.html">BRITE</a> | <a href="/kegg/kegg2.html">KEGG2</a> | <a href="/kegg/">KEGG</a> ]</b><br>
#Last updated: May 18, 2018
#<br><br><a href="/kegg-bin/get_htext?ko00001_all.keg">» All categories</a>
(我用Notepad ++打开它)
在这个文件中,你可以看到KEGG对我的每个基因的不同功能类别,后者被称为“MYGENEACCESSION01”(或 - “02”, - “03”等)。
我想从第一个file.keg中提取并整理所有信息到一个新文件(例如excel),如下所示:https://www.dropbox.com/s/xq4714ngesap9dx/annotation.xlsx?dl=0
CSV版本:
accession,kegg.first.level,kegg.second.level,kegg.third.level,kegg.fourth.level,path ,KO
MYGENEACCESSION01,metabolism,carbohydrate metabolism,glycolisis / Gluconeogenesis,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00010,K01623
MYGENEACCESSION02,metabolism,carbohydrate metabolism,Pentose phosphate pathway ,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00030,K01623
MYGENEACCESSION03,metabolism,carbohydrate metabolism,Fructose and mannose metabolism, PMM; phosphomannomutase [EC:5.4.2.8],PATH:ko00051,K17497
MYGENEACCESSION04,metabolism,carbohydrate metabolism,Fructose and mannose metabolism,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00051,K01623
MYGENEACCESSION05,metabolism,carbohydrate metabolism,Amino sugar and nucleotide sugar metabolism,chitinase [EC:3.2.1.14],PATH:ko00520,K01183
我手动完成了它,但它非常繁琐,我的数据集比提供的示例大得多。
有没有想过用R或其他程序自动完成? (您认为R脚本可以完成这项工作吗?)