我想使用python从数据框中删除多个列(大约800个)。我写了下面的代码:
def corr_df(x, corr_val):
# Creates Correlation Matrix and Instantiates
corr_matrix = x.corr()
iters = range(len(corr_matrix.columns) - 1)
drop_cols = []
df_drop=pd.DataFrame()
cols=[]
# Iterates through Correlation Matrix Table to find correlated columns
for i in iters:
for j in range(i):
item = corr_matrix.iloc[j:(j+1), (i+1):(i+2)]
col = item.columns
row = item.index
val = item.values
if val >= corr_val:
# Prints the correlated feature set and the corr val
#print(col.values[0], "|", row.values[0], "|", round(val[0][0], 2))
drop_cols.append(i)
drops = sorted(set(drop_cols))[::-1]
df_dropped=x.drop(drops,axis=1)
# Drops the correlated columns
# for i in drops:
# col=(x.iloc[:, (i+1):(i+2)].columns.values.tolist())
# print (col)
# df_dropped=df.drop(col, axis=1)
#cols.append()
#print(df_dropped)
return (df_dropped)
但是这段代码打印的数据帧只丢掉了一列。对此有何意见或建议?
提前致谢
答案 0 :(得分:1)
按数字索引删除多列,如下所示:
cols = [1069, 1068, 1067]
df = df.drop(df.columns[cols], axis=1)