Question

我正在使用Pandas数据帧和matplotlib制作多图。但是，当我更改数据框并删除其中一个项目时，我收到错误：

ValueError: cannot reindex from a duplicate axis

我的初始代码如下，并且它很好，但我有一个我不需要的额外组（情节）：

branchGroups = allData['BranchGroupings'].unique()

fig2 = plt.figure(figsize = (15,15))

for i,branchGroups in enumerate(branchGroups):
    ax = plt.subplot(3,3,i+1)
    idx = allData['BranchGroupings'] == branchGroups

    kmf.fit(T[idx], C[idx], label=branchGroups)


    kmf.plot(ax=ax, legend=False)
    plt.title(branchGroups)
    plt.xlabel('Timeline in Months')
    plt.xlim(0,150)


fig2.tight_layout()
fig2.suptitle('Cumulative Hazard Function of Employee Groups', size = 16)
fig2.subplots_adjust(top=0.88, hspace = .4)
plt.show()

在branchGroups中，当我打印出来时有7个项目：

['BranchMgr', 'Banker', 'Service', 'MDOandRSM', 'SBRMandBBRM','FC', 'DE']

上面的代码很好地完成了所有七个图，但我不需要'DE'分组（每个组的一个图）。

所以，我通过执行以下操作来删除DE：

#remove the DE from the data set
noDE = allData[allData.BranchGroupings != 'DE']

这会从类别中删除“DE”并减少行数。我做了一个头（），它看起来很棒;一个新的数据框架。

然后，修改绘图给出6组并绘制缩小的数据帧noDE，我使用相同的代码，有些名称更改，如fig3而不是fig2，并将idx更改为idxx以防止覆盖，否则它是相同的，除了新数据框参考noDE：

Groups = noDE['BranchGroupings'].unique()  #new data frame noDE

fig3 = plt.figure(figsize = (15,15))

for i,Groups in enumerate(Groups):
ax = plt.subplot(3,2,i+1)
idxx = noDE['BranchGroupings'] == Groups   #new idxx rather than idx

kmf.fit(T[idxx], C[idxx], label=Groups)


kmf.plot(ax=ax, legend=False)
plt.title(Groups)
plt.xlabel('Timeline in Months')
plt.xlim(0,150)
if  i ==0:
    plt.ylabel('Frac Employed After $n$ Months')
if  i ==3:
    plt.ylabel('Frac Employed After $n$ Months')

fig3.tight_layout()
fig3.suptitle('Survivability of Branch Employees', size = 16)
fig3.subplots_adjust(top=0.88, hspace = .4)
plt.show()

除此之外，我收到上述错误

无法从重复轴重新索引

并且追溯显示它与下面的行相关联：

kmf.fit(T[idxx], C[idxx], label=Groups)

很可能是由于它上面的重新分配线：

idxx = noDE['BranchGroupings'] == Groups

我是否需要重置/删除或对新数据框noDE执行某些操作才能重置此内容？

更新 - 这已经解决了;我不确定它是多么'pythonic'，但它确实有效：

好的，经过对此的更多研究，似乎在切片数据帧时，存在继承问题。我从另一篇文章here中找到了。

最初，执行以下操作：

noDE.index.is_unique返回False

要制作干净切片，需要执行以下步骤：

#create the slice using the .copy
noDE = allData[['ProdCat', 'Duration', 'Observed', 'BranchGroupings']].copy()

#remove the DE from the data set
noDE = noDE.loc[noDE['BranchGroupings'] != 'DE'] #use .loc for cleaner slice

#reset the index so that it is unique
noDE['index'] = np.arange(len(noDE))
noDE = noDE.set_index('index')

现在执行noDE.index.is_unique返回True，错误就消失了。

当Pandas数据框是子集时，是否需要重新索引？

0 个答案: