我正在尝试应用以下函数来计算每个数据帧列的斜率和截距:
from scipy.stats import linregress
def fit_line(x, y):
"""Return slope, intercept of best fit line."""
# Remove entries where either x or y is NaN.
clean_data = pd.concat([x, y], 1).dropna(0) # row-wise
(_, x), (_, y) = clean_data.iteritems()
slope, intercept, r, p, stderr = linregress(x, y)
return slope, intercept
我创建了一个包含两列的新数据框,但是,我真的不知道如何将第一列作为(x)和其他列作为y传递?
df['m'], df['b'] = df_freq.apply(fit_line(x?, y?), axis=1)
这里是数据帧的列,所有数据都是浮点数。
指数(['时间','5','10','15','20','25','30','35','40','45','50', '55', '60','65','70','75','80','85','90','95','100','105','110', '115','120','125','130','135','140','145','150','155','160', '165','170','175','180','185','190','195','200','205','210', '215','220','225','230','235','240','245'], DTYPE = '对象')
答案 0 :(得分:1)
编辑:对不起我想念你的问题。
编辑2:考虑到默认情况下附加不在适当位置
我认为最容易使用for循环来实现你想要的东西。假设您有不同的列,其中y值和索引为x值:
df_fit_parameter = pd.DataFrame()
for column in df_freq.columns:
df_lin_fit = df_freq[column].dropna()
slope, intercept, r, p, stderr = linregress(df_lin_fit.index, df_lin_fit)
df_fit_parameter = df_fit_parameter.append(pd.DataFrame({'m':slope,'b':intercept}, index=[column]))