在熊猫数据中放置标签时发生KeyError

时间:2019-10-13 01:31:55

标签: python pandas

我将CSV数据集加载到了数据框中。我想展示列之间的最高相关性(前10个负数和前10个正数)

我在这个网站上找到了我认为可以帮助我的代码-

def get_redundant_pairs(df):
    '''Get diagonal and lower triangular pairs of correlation matrix'''
    pairs_to_drop = set()
    cols = df.columns
    for i in range(0, df.shape[1]):
        for j in range(0, i+1):
             pairs_to_drop.add((cols[i], cols[j]))
    return pairs_to_drop


def get_top_abs_correlations(df, n=5):
    au_corr = df.corr().abs().unstack()
    labels_to_drop = get_redundant_pairs(df)
    au_corr = au_corr.drop(labels=labels_to_drop).sort_values(ascending=False)
    return au_corr[0:n]

我从DataFrame调用此函数-

train = pd.read_csv('/content/drive/My Drive/DSF_HW3_Datasets/train.csv')
get_top_abs_correlations(train.loc[:, train.columns != 'Id'],10)

我得到一个KeyError值-

KeyError: 'Foundation'

During handling of the above exception, another exception occurred:
....
/usr/local/lib/python3.6/dist-packages/pandas/core/indexes/multi.py in get_loc(self, key, method)
   2404 
   2405         if keylen == self.nlevels and self.is_unique: 
-> 2406             return self._engine.get_loc(key)
   2407 
   2408         # -- partial selection or non-unique index

 pandas/_libs/index.pyx in pandas._libs.index.BaseMultiIndexCodesEngine.get_loc()

 KeyError: ('Foundation', 'OverallQual')

如何解决此错误? Train.csv文件-https://pastebin.com/vTh6md5W

1 个答案:

答案 0 :(得分:0)

您要屏蔽/最大:

# get the correlation matrix
corr = df.corr()

# mask away the lower triangle and diagonal
mask = np.triu(np.ones_like(corr),1) == 1

# get the upper triangle (excluding diagonal) by masking and stack:
corr = corr.where(mask).stack()

# 10 largest by absolute values
max10 = corr.abs().nlargest(10)

输出(最大10):

GarageCars    GarageArea      0.882475
YearBuilt     GarageYrBlt     0.825667
GrLivArea     TotRmsAbvGrd    0.825489
TotalBsmtSF   1stFlrSF        0.819530
OverallQual   SalePrice       0.790982
GrLivArea     SalePrice       0.708624
2ndFlrSF      GrLivArea       0.687501
BedroomAbvGr  TotRmsAbvGrd    0.676620
BsmtFinSF1    BsmtFullBath    0.649212
YearRemodAdd  GarageYrBlt     0.642277
dtype: float64

要获取原始(已签名)相关性:

corr.loc[max10.index]

恰好与绝对最大值相同。