Question

到目前为止，这是我的代码：

import numpy as np
import pandas as pd
df = pd.read_excel(r'file.xlsx', index_col=0)

是这样的：

我想将“未命名：*”列重命名为最后一个有效名称。

这是我尝试过的结果：

df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-253-c868b8bff7c7> in <module>()
----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')

TypeError: replace() got an unexpected keyword argument 'method'

如果我愿意的话，这个“有效”

df.columns = df.columns.str.replace('Unnamed.*', '')

但是我然后有一个空白值或NaN（如果我将'''替换为'NaN'。然后我尝试：

df.columns = df.columns.fillna('ffill')

哪个无效。所以我尝试了inplace = True：

df.columns = df.columns.fillna（'ffill'，inplace = True）

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-279-cce486472d5b> in <module>()
----> 1 df.columns = df.columns.fillna('ffill', inplace=True)

TypeError: fillna() got an unexpected keyword argument 'inplace'

然后我尝试了另一种方式：

i = 0
while i < len(df.columns):
    if df.columns[i] == 'NaN':
        df.columns[i] = df.columns[i-1]
    print(df.columns[i])
    i += 1

哪个给我这个错误：

Oil
158 RGN Mistura
Access West Winter Blend 

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-246-bc8fa6881b1a> in <module>()
      2 while i < len(df.columns):
      3     if df.columns[i] == 'NaN':
----> 4         df.columns[i] = df.columns[i-1]
      5     print(df.columns[i])
      6     i += 1

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
   2048 
   2049     def __setitem__(self, key, value):
-> 2050         raise TypeError("Index does not support mutable operations")
   2051 
   2052     def __getitem__(self, key):

TypeError: Index does not support mutable operations

Answer 1

我做了以下事情，我认为它可以维持您所寻求的顺序。

df = pd.read_excel('book1.xlsx')
print df


    a   b   c  Unnamed: 3  Unnamed: 4   d  Unnamed: 6   e  Unnamed: 8   f
0  34  13  73         nan         nan  87         nan  76         nan  36
1  70  48   1         nan         nan  88         nan   2         nan  77
2  37  62  28         nan         nan   2         nan  53         nan  60
3  17  97  78         nan         nan  69         nan  93         nan  48
4  65  19  96         nan         nan  72         nan   4         nan  57
5  63   6  86         nan         nan  14         nan  20         nan  51
6  10  67  54         nan         nan  52         nan  48         nan  79


df.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in df.columns.values]).ffill().values.flatten()
print df


    a   b   c   c   c   d   d   e   e   f
0  34  13  73 nan nan  87 nan  76 nan  36
1  70  48   1 nan nan  88 nan   2 nan  77
2  37  62  28 nan nan   2 nan  53 nan  60
3  17  97  78 nan nan  69 nan  93 nan  48
4  65  19  96 nan nan  72 nan   4 nan  57
5  63   6  86 nan nan  14 nan  20 nan  51
6  10  67  54 nan nan  52 nan  48 nan  79

Answer 2

可行的方法：

df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()

，完整示例：

import numpy as np
import pandas as pd

df = pd.DataFrame(columns=['First', 'Unnamed: 1', 'Unnamed: 2','Second', 'Unnamed: 3'])

df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()

print(df.columns)

打印：

Index(['First', 'First', 'First', 'Second', 'Second'], dtype='object')

Answer 3

您遇到的问题与列和索引是pd.Index对象有关。熊猫索引的fillna方法采用的参数与熊猫Series或DataFrame的fillna方法采用的参数不同。我在下面做了一个玩具示例：

import pandas as pd
import numpy as np
df = pd.DataFrame(
         {'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]}, 
         columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))
df 
#   a  Unnamed:3  Unnamed:1  b  Unnamed:2
#0  1          1          1  1          1

您原来的正则表达式无法捕获整个列名，让我们对其进行修复。

df.columns.str.replace('Unnamed:*', '') 
#Index(['a', '3', '1', 'b', '2'], dtype='object')
df.columns.str.replace('Unnamed:\d+', '')
#Index(['a', '', '', 'b', ''], dtype='object')
df.columns.str.replace('Unnamed:.+', '')
#Index(['a', '', '', 'b', ''], dtype='object')

现在让我们将索引转换为序列，以便我们可以使用.replace的{{1}}和.fillna方法以及一个有效的正则表达式将相关的列名替换为{ {1}}。最后，我们转换为pd.Series

ffill

熊猫：在数据框中重命名“未命名：*”或“ NaN”

3 个答案: