我有pandas.dataframe
:
import pandas as pd
df = pd.DataFrame( {'one': pd.Series([1., 2., 3.],
index=['a', 'b', 'c']),
'two': pd.Series([1., 2., 3., 4.],
index=['a', 'b', 'c', 'd']),
'three': pd.Series([0., 6., 1.],
index=['b', 'c', 'd']),
'two_': pd.Series([1., 2., 5, 4.],
index=['a', 'b', 'c', 'd'])})
或
print (df)
# one three two two_
#a 1 NaN 1 1
#b 2 0 2 2
#c 3 6 3 5
#d NaN 1 4 4
我有一张地图可以重命名某些列
name_map = {'one': 'one', 'two': 'two_'}
df.rename(columns=name_map)
# one three two_ two_
# a 1 NaN 1 1
# b 2 0 2 2
# c 3 6 3 5
# d NaN 1 4 4
(偶尔name_map
可能会将列映射到自身,例如'one' - >'one')。我最终想要的是对象
# one_ three two_
#a 1 NaN 1
#b 2 0 2
#c 3 6 3
#d NaN 1 4
在重命名之前我应该如何删除潜在的重复项?
答案 0 :(得分:2)
首先获取公共列ipython3-notebook
和list(set(name_map.values()) & set(df.columns))
。然后drop()
使用rename()
columns=name_map
答案 1 :(得分:0)
我有一种方法,但看起来有点乱(处理NaN值会造成混乱)
potential_duplicates = [ new
for old,new in name_map.items()
if new in list(df) # if the new column name exists
and
pd.np.any( df[old][df[old]==df[old]] # if said column differs from the one to be renames
!= df[new][df[new]==df[new]] ) ]
df.drop( potential_duplicates, axis = 1, inplace=True)
df.rename( columns=name_map)
# one_ two_
#a 1 1
#b 2 2
#c 3 3
#d NaN 4
答案 2 :(得分:0)
我认为最简单的方法是删除name_map
值列表中不存在的列(因为您要删除第一个two
列)
In [74]: df
Out[74]:
one two two_
a 1 1 1
b 2 2 2
c 3 3 5
d NaN 4 4
In [76]: df.drop([col for col in df.columns if col not in name_map.keys()], axis=1)
Out[76]:
one two
a 1 1
b 2 2
c 3 3
d NaN 4