应用Plyfit函数查找每个数据框列的斜率

时间:2017-09-05 10:48:45

标签: python pandas regression

我正在尝试应用以下函数来计算每个数据帧列的斜率和截距:

from scipy.stats import linregress
def fit_line(x, y):
     """Return slope, intercept of best fit line."""
     # Remove entries where either x or y is NaN.
     clean_data = pd.concat([x, y], 1).dropna(0) # row-wise
     (_, x), (_, y) = clean_data.iteritems()
     slope, intercept, r, p, stderr = linregress(x, y)
     return slope, intercept

我创建了一个包含两列的新数据框,但是,我真的不知道如何将第一列作为(x)和其他列作为y传递?

df['m'], df['b']  = df_freq.apply(fit_line(x?, y?), axis=1)

这里是数据帧的列,所有数据都是浮点数。

指数(['时间','5','10','15','20','25','30','35','40','45','50', '55',        '60','65','70','75','80','85','90','95','100','105','110',        '115','120','125','130','135','140','145','150','155','160',        '165','170','175','180','185','190','195','200','205','210',        '215','220','225','230','235','240','245'],       DTYPE = '对象')

1 个答案:

答案 0 :(得分:1)

编辑:对不起我想念你的问题。

编辑2:考虑到默认情况下附加不在适当位置

我认为最容易使用for循环来实现你想要的东西。假设您有不同的列,其中y值和索引为x值:

df_fit_parameter = pd.DataFrame()
for column in df_freq.columns:
  df_lin_fit = df_freq[column].dropna()
  slope, intercept, r, p, stderr = linregress(df_lin_fit.index, df_lin_fit)
  df_fit_parameter = df_fit_parameter.append(pd.DataFrame({'m':slope,'b':intercept}, index=[column]))