我正在使用一个程序包,该程序针对列表中的每个元素在文件中打印以下几行:
Entry Entry name Status Protein names Gene names Organism
A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200 Ostreococcus lucimarinus virus 7
Entry Entry name Status Protein names Gene names Organism
A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159 Ostreococcus lucimarinus virus 7
Entry Entry name Status Protein names Gene names Organism
A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c Ostreococcus lucimarinus virus 7
... X 1000
所以如果我用熊猫打开这个文件,我会得到一个数据框,例如:
>>> blast
Entry Entry name Status Protein names Gene names
0 A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200
1 NaN NaN NaN NaN NaN
2 A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159
3 NaN NaN NaN NaN NaN
4 Entry Entry name Status Protein names Gene names
5 A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c
,我只想创建一个数据框,其名称为:
Entry Entry name Status Protein names Gene names Organism
A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200 Ostreococcus lucimarinus virus 7
A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159 Ostreococcus lucimarinus virus 7
A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c Ostreococcus lucimarinus virus 7
您知道在python3中使用熊猫的方法吗?
更新的数据框:
Entry Entry name Status Protein names Gene names
0 A0A20CSC4 A0A20CSC4_1PHYC unreviewed Uncharacterized protein OlL7_200
2 A0A0P0DZ8 A0A0PCDZ8_9PLYC unreviewed Uncharacterized protein OlL7_159
4 Entry Entry name Status Protein names Gene names
5 A0A1P0BY71 A0A1P0BY71_9PHYC unreviewed Uncharacterized protein OlL7_111c
第4行仍具有行名。
答案 0 :(得分:1)
因此,获得这种类型的输出的一种方法是删除NaN值。
所以你可以做,
blast.dropna(inplace=True)
blast.drop(blast[blast['Entry'] == 'Entry'].index, inplace=True)
这应该有效。