合并多个数据框,仅保留一组colnames

时间:2019-02-27 15:05:11

标签: python pandas

我正在使用一个程序包,该程序针对列表中的每个元素在文件中打印以下几行:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

... X 1000

所以如果我用熊猫打开这个文件,我会得到一个数据框,例如:

>>> blast
        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
1         NaN               NaN         NaN                      NaN         NaN
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
3         NaN               NaN         NaN                      NaN         NaN
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

,我只想创建一个数据框,其名称为:

Entry   Entry name  Status  Protein names   Gene names  Organism
A0A20CSC4   A0A20CSC4_1PHYC unreviewed  Uncharacterized protein OlL7_200    Ostreococcus lucimarinus virus 7
A0A0P0DZ8   A0A0PCDZ8_9PLYC unreviewed  Uncharacterized protein OlL7_159    Ostreococcus lucimarinus virus 7
A0A1P0BY71  A0A1P0BY71_9PHYC    unreviewed  Uncharacterized protein OlL7_111c   Ostreococcus lucimarinus virus 7

您知道在python3中使用熊猫的方法吗?

更新的数据框:

        Entry        Entry name      Status            Protein names  Gene names
0   A0A20CSC4   A0A20CSC4_1PHYC  unreviewed  Uncharacterized protein    OlL7_200
2   A0A0P0DZ8   A0A0PCDZ8_9PLYC  unreviewed  Uncharacterized protein    OlL7_159
4       Entry        Entry name      Status            Protein names  Gene names
5  A0A1P0BY71  A0A1P0BY71_9PHYC  unreviewed  Uncharacterized protein   OlL7_111c

第4行仍具有行名。

1 个答案:

答案 0 :(得分:1)

因此,获得这种类型的输出的一种方法是删除NaN值。

所以你可以做, blast.dropna(inplace=True)

blast.drop(blast[blast['Entry'] == 'Entry'].index, inplace=True)

这应该有效。