到目前为止,这是我的代码:
import numpy as np
import pandas as pd
df = pd.read_excel(r'file.xlsx', index_col=0)
我想将“未命名:*”列重命名为最后一个有效名称。
这是我尝试过的结果:
df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-253-c868b8bff7c7> in <module>()
----> 1 df.columns = df.columns.str.replace('Unnamed.*', method='ffill')
TypeError: replace() got an unexpected keyword argument 'method'
如果我愿意的话,这个“有效”
df.columns = df.columns.str.replace('Unnamed.*', '')
但是我然后有一个空白值或NaN(如果我将'''替换为'NaN'。然后我尝试:
df.columns = df.columns.fillna('ffill')
哪个无效。所以我尝试了inplace = True:
df.columns = df.columns.fillna('ffill',inplace = True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-279-cce486472d5b> in <module>()
----> 1 df.columns = df.columns.fillna('ffill', inplace=True)
TypeError: fillna() got an unexpected keyword argument 'inplace'
然后我尝试了另一种方式:
i = 0
while i < len(df.columns):
if df.columns[i] == 'NaN':
df.columns[i] = df.columns[i-1]
print(df.columns[i])
i += 1
哪个给我这个错误:
Oil
158 RGN Mistura
Access West Winter Blend
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-246-bc8fa6881b1a> in <module>()
2 while i < len(df.columns):
3 if df.columns[i] == 'NaN':
----> 4 df.columns[i] = df.columns[i-1]
5 print(df.columns[i])
6 i += 1
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in __setitem__(self, key, value)
2048
2049 def __setitem__(self, key, value):
-> 2050 raise TypeError("Index does not support mutable operations")
2051
2052 def __getitem__(self, key):
TypeError: Index does not support mutable operations
答案 0 :(得分:1)
我做了以下事情,我认为它可以维持您所寻求的顺序。
df = pd.read_excel('book1.xlsx')
print df
a b c Unnamed: 3 Unnamed: 4 d Unnamed: 6 e Unnamed: 8 f
0 34 13 73 nan nan 87 nan 76 nan 36
1 70 48 1 nan nan 88 nan 2 nan 77
2 37 62 28 nan nan 2 nan 53 nan 60
3 17 97 78 nan nan 69 nan 93 nan 48
4 65 19 96 nan nan 72 nan 4 nan 57
5 63 6 86 nan nan 14 nan 20 nan 51
6 10 67 54 nan nan 52 nan 48 nan 79
df.columns = pd.Series([np.nan if 'Unnamed:' in x else x for x in df.columns.values]).ffill().values.flatten()
print df
a b c c c d d e e f
0 34 13 73 nan nan 87 nan 76 nan 36
1 70 48 1 nan nan 88 nan 2 nan 77
2 37 62 28 nan nan 2 nan 53 nan 60
3 17 97 78 nan nan 69 nan 93 nan 48
4 65 19 96 nan nan 72 nan 4 nan 57
5 63 6 86 nan nan 14 nan 20 nan 51
6 10 67 54 nan nan 52 nan 48 nan 79
答案 1 :(得分:0)
可行的方法:
df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()
,完整示例:
import numpy as np
import pandas as pd
df = pd.DataFrame(columns=['First', 'Unnamed: 1', 'Unnamed: 2','Second', 'Unnamed: 3'])
df.columns = df.columns.where(~df.columns.str.startswith('Unnamed')).to_series().ffill()
print(df.columns)
打印:
Index(['First', 'First', 'First', 'Second', 'Second'], dtype='object')
答案 2 :(得分:0)
您遇到的问题与列和索引是pd.Index
对象有关。熊猫索引的fillna方法采用的参数与熊猫Series或DataFrame的fillna方法采用的参数不同。
我在下面做了一个玩具示例:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{'a':[1], 'Unnamed:1':[1], 'Unnamed:2':[1], 'b':[1], 'Unnamed:3':[1]},
columns=['a', 'Unnamed:3', 'Unnamed:1', 'b', 'Unnamed:2']))
df
# a Unnamed:3 Unnamed:1 b Unnamed:2
#0 1 1 1 1 1
您原来的正则表达式无法捕获整个列名,让我们对其进行修复。
df.columns.str.replace('Unnamed:*', '')
#Index(['a', '3', '1', 'b', '2'], dtype='object')
df.columns.str.replace('Unnamed:\d+', '')
#Index(['a', '', '', 'b', ''], dtype='object')
df.columns.str.replace('Unnamed:.+', '')
#Index(['a', '', '', 'b', ''], dtype='object')
现在让我们将索引转换为序列,以便我们可以使用.replace
的{{1}}和.fillna
方法以及一个有效的正则表达式将相关的列名替换为{ {1}}。最后,我们转换为pd.Series
ffill