重命名列+向数据帧添加缺少的列

时间:2016-12-09 18:07:46

标签: python pandas

背景:

我有一个数据框,其列如下所示:

>>> merge_df['AAChange']
0    STK11:NM_000455:exon1:c.148_149TG
Name: AAChange, dtype: object

我需要将其拆分为':'角色,像这样:

>>> new_cols = merge_df['AAChange'].str.split(':').apply(pd.Series,1)
>>> new_cols
       0          1      2            3
0  STK11  NM_000455  exon1  c.148_149TG

然后我需要重命名列,所以我将新名称存储在列表中:

>>> new_colnames = ['Gene.AA', 'Transcript', 'Exon', 'Coding', 'Amino Acid Change']

但是,存在一个问题:输出中必须存在所有这5列,但在此数据条目中,源数据中缺少一个字段,只留下4个字段。因此,尝试重命名列失败:

>>> new_cols.columns = new_colnames
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/generic.py", line 2371, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/src/properties.pyx", line 65, in pandas.lib.AxisProperty.__set__ (pandas/lib.c:45002)
  File "/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/generic.py", line 425, in _set_axis
    self._data.set_axis(axis, labels)
  File "/local/apps/python/2.7.3/lib/python2.7/site-packages/pandas/core/internals.py", line 2572, in set_axis
    'new values have %d elements' % (old_len, new_len))
ValueError: Length mismatch: Expected axis has 4 elements, new values have 5 elements

因此,我想为每个缺少的列添加一个空列,并同时更改列名。 This answer似乎有一个很好的解决方案;根据新列列表重新编制索引。但是,它没有给出预期的结果:

>>> new_cols.reindex(columns = new_colnames)
   Gene.AA  Transcript  Exon  Coding  Amino Acid Change
0      NaN         NaN   NaN     NaN                NaN

现在我已经找到了所有缺失的列,但原始数据已丢失。有没有更好的解决方案可以让我重命名现有列并添加所有缺少的列?

所需的输出如下所示:

>>> new_cols.reindex(columns = new_colnames)
   Gene.AA  Transcript   Exon         Coding  Amino Acid Change
0    STK11   NM_000455  exon1   c.148_149TG                NaN

1 个答案:

答案 0 :(得分:0)

您可以使用前导所需的名称重命名原始列名称。

new_cols.columns = new_colnames[:-1]

# new_cols
  Gene.AA Transcript   Exon       Coding
0   STK11  NM_000455  exon1  c.148_149TG

然后,通过以下命令插入额外的一个。它将新列作为#4列插入,并使用nan值填充它。

new_cols.insert(4, new_colnames[-1], [np.nan]*len(new_cols.index))

# new_cols
  Gene.AA Transcript   Exon       Coding  Amino Acid Change
0   STK11  NM_000455  exon1  c.148_149TG                NaN