Question

我想使用MultiIndex值从pandas数据框中删除行。

我已经尝试了很多东西，但是我把它放在了我认为更近的地方。（实际上我会解释完整的问题，因为可能有一种使用完全不同的方法的替代解决方案）。从相关矩阵中，我想获得更多相关的列对。我使用unstack并将结果放在数据框中：

In [263]: corr_df = pd.DataFrame(total.corr().unstack())

然后获得更高的相关性（实际上我也应该得到否定数据）。

In [264]: high = corr_df[(corr_df[0] > 0.5) & (corr_df[0] < 1.0)]

In [236]: print high
                                                  0
residual sugar       density               0.552517
free sulfur dioxide  total sulfur dioxide  0.720934
total sulfur dioxide free sulfur dioxide   0.720934
                     wine                  0.700357
density              residual sugar        0.552517
wine                 total sulfur dioxide  0.700357

足够封闭，但有重复，这实际上是相关矩阵的要点。为了清理它们，我的想法是迭代高值以删除重复项：

In [267]:
for row in high.iterrows():
    print row[0][0], ",", row[0][1]
    print high.loc[row[0][1]].loc[row[0][0]].index
    high.drop(high.loc[row[0][1]].loc[row[0][0]].index)
residual sugar , density
Int64Index([0], dtype='int64')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-267-1258da2a4772> in <module>()
      2     print row[0][0], ",", row[0][1]
      3     print high.loc[row[0][1]].loc[row[0][0]].index
----> 4     high.drop(high.loc[row[0][1]].loc[row[0][0]].index)

...
[huge stack of errors]
...
KeyError: 0

方法drop在索引正常时工作正常（请参阅drop），但是，当我得到label时，如何构建MultiIndex？< / p>

Answer 1

corr_df = pd.DataFrame(
{'residual sugar': [1, 0, 0, 0.552517, 0], 
'free sulfur dioxide': [0, 1, 0.720934, 0, 0], 
'total sulfur dioxide': [0, 0.720934, 1, 0, 0.700357],
'density': [0.552517, 0, 0, 1, 0],
'wine': [0, 0, 0.700357, 0, 1]}, 
index=['residual sugar', 'free sulfur dioxide', 'total sulfur dioxide', 'density', 'wine']).unstack()

# Notice the slight modification to the original
high = corr_df[(corr_df > 0.5) & (corr_df < 1.0)]

# Sort by index, then values
high.sort_index()
high.sort()

# Drop every other value (e.g. just take the evens)
result = high.iloc[[count for count, _ in enumerate(high) if count % 2 == 0]]
>>> result
density               residual sugar          0.552517
total sulfur dioxide  wine                    0.700357
free sulfur dioxide   total sulfur dioxide    0.720934

pandas dataframe通过multiindex删除行

1 个答案: