我想使用pandas和statsmodels在数据帧的子集上拟合线性模型并返回预测值。但是,我无法弄清楚使用正确的熊猫成语。这是我想要做的:
import pandas as pd
import statsmodels.formula.api as sm
import seaborn as sns
tips = sns.load_dataset("tips")
def fit_predict(df):
m = sm.ols("tip ~ total_bill", df).fit()
return pd.Series(m.predict(df), index=df.index)
tips["predicted_tip"] = tips.groupby("day").transform(fit_predict)
这会引发以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-139-b3d2575e2def> in <module>()
----> 1 tips["predicted_tip"] = tips.groupby("day").transform(fit_predict)
/Users/mwaskom/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in transform(self, func, *args, **kwargs)
3033 return self._transform_general(func, *args, **kwargs)
3034 except:
-> 3035 return self._transform_general(func, *args, **kwargs)
3036
3037 # a reduction transform
/Users/mwaskom/anaconda/lib/python2.7/site-packages/pandas/core/groupby.pyc in _transform_general(self, func, *args, **kwargs)
2988 group.T.values[:] = res
2989 else:
-> 2990 group.values[:] = res
2991
2992 applied.append(group)
ValueError: could not broadcast input array from shape (62) into shape (62,6)
错误是有道理的,我认为.transform
想要将DataFrame映射到DataFrame。但是有没有办法在DataFrame上进行groupby操作,将每个块传递给一个函数,将其缩减为一个Series(具有相同的索引),然后将得到的Series组合成可插入原始数据帧的东西?
答案 0 :(得分:2)
这里的顶部是相同的,我只是使用玩具数据集b / c我在防火墙后面。
tips = pd.DataFrame({ 'day':list('MMMFFF'), 'tip':range(6),
'total_bill':[10,40,20,80,50,40] })
def fit_predict(df):
m = sm.ols("tip ~ total_bill", df).fit()
return pd.Series(m.predict(df), index=df.index)
如果您将'transform'更改为'apply',您将获得:
tips.groupby("day").apply(fit_predict)
day
F 3 2.923077
4 4.307692
5 4.769231
M 0 0.714286
1 1.357143
2 0.928571
这不是你想要的,但如果你降低等级= 0,你可以按照需要继续:
tips['predicted'] = tips.groupby("day").apply(fit_predict).reset_index(level=0,drop=True)
day tip total_bill predicted
0 M 0 10 0.714286
1 M 1 40 1.357143
2 M 2 20 0.928571
3 F 3 80 2.923077
4 F 4 50 4.307692
5 F 5 40 4.769231
答案 1 :(得分:0)
编辑:
Scanner sc = new Scanner(System.in);
System.out.println("Enter number: ");
int number = sc.nextInt();
int five = 5;
int seven = 7;
boolean a = (number % five == 0) && (number % seven == 0);
if (a == true)
{
System.out.println(number + " divides on both 5 and 7.");
}
else
{
System.out.println(number + " doesn't divide on both 5 and 7.");
}
if (number % five == 0)
{
System.out.println(number + " divides successfuly by 5.");
}
else
{
System.out.println(number + " can't be divided successfuly by 5.");
}
if (number % seven == 0)
{
System.out.println(number +" divides successfuly by 7.");
}
else
{
System.out.println(number + " can't be divided successfuly by 7.");
}
我必须修改您的q.gps.apply(lambda df: df.join(q.fit_predict(df)))
函数以命名fit_predict
。
Series