在下面有关统计信息的问题中,我正在尝试使用python中的“两样本独立t检验”。
一家百货公司的分析师希望评估最近的信用卡促销活动。为此,随机选择了500个持卡人。一半收到的广告会在接下来的三个月内降低购买利率,一半收到标准的季节性广告。促销有效地增加了销量吗? 下面是我的代码。我在编写代码时出错,请帮忙。
from scipy import stats
std_promo = cust[(cust['insert'] == 'Standard')]
new_promo = cust[(cust['insert'] == 'New Promotion')]
print(std_promo.head(3))
print(new_promo.head(3))
id insert dollars
0 148 Standard 2232.771979
2 973 Standard 2327.092181
3 1096 Standard 1280.030541
id insert dollars
1 572 New Promotion 1403.807542
4 1541 New Promotion 1513.563200
5 1947 New Promotion 1729.627996
print (std_promo.mean())
print (new_promo.mean())
id 69003.000000
dollars 1566.389031
dtype: float64
id 64998.244000
dollars 1637.499983
dtype: float64
print (std_promo.std())
print (new_promo.std())
id 37753.106923
dollars 346.673047
dtype: float64
id 38508.218870
dollars 356.703169
dtype: float64
stats.ttest_ind(a= std_promo, b= new_promo, equal_var= True)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-76-b40f7d9d7a3e> in <module>
1 stats.ttest_ind(a= std_promo,
----> 2 b= new_promo)
~\Anaconda3\lib\site-packages\scipy\stats\stats.py in ttest_ind(a, b, axis, equal_var, nan_policy)
4163 return Ttest_indResult(np.nan, np.nan)
4164
-> 4165 v1 = np.var(a, axis, ddof=1)
4166 v2 = np.var(b, axis, ddof=1)
4167 n1 = a.shape[axis]
~\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py in var(a, axis, dtype, out, ddof, keepdims)
3365
3366 return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
-> 3367 **kwargs)
3368
3369
~\Anaconda3\lib\site-packages\numpy\core\_methods.py in _var(a, axis, dtype, out, ddof, keepdims)
108 if isinstance(arrmean, mu.ndarray):
109 arrmean = um.true_divide(
--> 110 arrmean, rcount, out=arrmean, casting='unsafe', subok=False)
111 else:
112 arrmean = arrmean.dtype.type(arrmean / rcount)
TypeError: unsupported operand type(s) for /: 'str' and 'int'
答案 0 :(得分:2)
我认为您需要更改:
stats.ttest_ind(a= std_promo, b= new_promo, equal_var= True)
到
stats.ttest_ind(a= std_promo.dollars, b= new_promo.dollars, equal_var= True)
我创建了一个与您相似的DF,并运行了它,并使用了美元:
Ttest_indResult(statistic=7.144078895160622, pvalue=9.765848295636031e-05)