Question

我有一个包含一些重复索引值的DataFrame：

df1 =  pd.DataFrame( np.random.randn(6,6), columns = pd.date_range('1/1/2010', periods=6), index = {"A", "B", "C", "D", "E", "F"})
df1.rename(index = {"C": "A", "B": "E"}, inplace = 1)

ipdb> df1
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
 A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
 A   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
 E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
 F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
 D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
 E   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

我想只更改重复值的名称并获取如下所示的DataFrame：

ipdb> df1
     2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A   -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E   -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F    1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D    0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

我的方法：

（i）使用新名称创建字典

old_names = df1[df1.index.duplicated()].index.values
new_names = df1[df1.index.duplicated()].index.values + "_dp"
dictionary = dict(zip(old_names, new_names))

（ii）仅重命名重复的值

df1.loc[df1.index.duplicated(),:].rename(index = dictionary, inplace = True)

然而，这似乎不起作用。

Answer 1

您可以使用Index.where：

<string-array name="gender">
        <item>Male</item>
        <item>Female</item>
    </string-array>

如果需要将重复索引删除为唯一：

df1.index = df1.index.where(~df1.index.duplicated(), df1.index + '_dp')
print (df1)
      2010-01-01  2010-01-02  2010-01-03  2010-01-04  2010-01-05  2010-01-06
A      -1.163883    0.593760    2.323342   -0.928527    0.058336   -0.209101
A_dp   -0.593566   -0.894161   -0.789849    1.452725    0.821477   -0.738937
E      -0.670305   -1.788403    0.134790   -0.270894    0.672948    1.149089
F       1.707686    0.323213    0.048503    1.168898    0.002662   -1.988825
D       0.403028   -0.879873   -1.809991   -1.817214   -0.012758    0.283450
E_dp   -0.224405   -1.803301    0.582946    0.338941    0.798908    0.714560

Answer 2

我在这个重命名功能中使用了jezrael的好答案：

def rn(df, suffix = '-duplicate-'):
    appendents = (suffix + df.groupby(level=0).cumcount().astype(str).replace('0','')).replace(suffix, '')
    return df.set_index(df.index + appendents)

然后这个：

df = pd.DataFrame({'a':[1,2,3,4,5,6,7,8, 9]}, index=['a'+str(i) for i in [1,2,3,3,4,3,5,5, 6]])
rn(df)

吐出这个：

    a
a1  1
a2  2
a3  3
a3-duplicate-1  4
a4  5
a3-duplicate-2  6
a5  7
a5-duplicate-1  8
a6  9

重命名重复的索引值pandas DataFrame

2 个答案: