Question

我有一个数据框，它是LinearRegression（）调用的输出，看起来像这样：

coeff_df = 
             Coefficient
pm  0.8297072586069981
sen 0.8199381072144118
tem 0.7483758123794492
no  0.2825715519743024
s_ref   -0.4376018493604922
ref -0.02338361622015777

我想删除我认为不重要的系数，例如：

coeff_df_abs = abs(coeff_df)
highestcoeff = coeff_df_abs.max()
lowestcoeff = coeff_df_abs.min()
if highestcoeff[0] / lowestcoeff[0] > 10
   #delete lowestcoeff from coeff_df

我可以得到一个具有nans的新数据框（或仅1x1数据框的.dropnans）

new_coeffs = coeff_df[coeff_df_abs==coeff_df_abs.min()]
#output
    Coefficient
pm25    
sen 
tem 
no  
s_ref   
ref -0.023383616220157777

然后如何从原始数据帧coeff_df中的new_coeffs中删除一个非南单元格？

请注意，我无法利用数据框单元格的值，因为我实际上正在测试与0或<0相对的接近0，并且我不知道哪些单元格是负数还是正数。

谢谢！

edit：我要说的是，实际意图是仅使用通过测试的系数调用新的线性回归，因此如果我能得到可以转化为列表并返回到我的列表的东西，则可以加分X，y个数据帧，例如。

possible_Xvars = ['pm','sen','tem']
X = dataset[possible_xvars].values  #this already works in my code, just for clarity of ultimate goal

Answer 1

然后我该如何从new_coeffs中删除一个非南单元格呢？原始数据框coeff_df？

假设第一列是您的索引，则只需使用

coeff_df.drop(new_coeffs.index)

等效于

coeff_df.drop(labels=new_coeffs.index, axis='index')

其中labels是标签名称或标签名称列表，axis定义指定的标签是否出现在数据帧的索引（行）或列中。另请参阅：https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.drop.html

通常，使用index属性也可能是问题第二部分的答案。 index.tolist()方法生成一个列表，其中包含数据框中的所有现有标签名称。因此，您要寻找的是：

possible_Xvars = new_coeffs.index.tolist()

根据已知的行标签而不是值从数据框中删除行

1 个答案: