您好我是使用来自SAS背景的pandas的新手,我正在尝试使用以下代码将连续变量分段为band。
var_range = df['BILL_AMT1'].max() - df['BILL_AMT1'].min()
a= 10
for i in range(1,a):
inc = var_range/a
lower_bound = df['BILL_AMT1'].min() + (i-1)*inc
print('Lower bound is '+str(lower_bound))
upper_bound = df['BILL_AMT1'].max() + (i)*inc
print('Upper bound is '+str(upper_bound))
if (lower_bound <= df['BILL_AMT1'] < upper_bound):
df['bill_class'] = i
i+=1
我希望代码能够检查df['BILL_AMT1']
的值是否在当前循环边界内,并相应地设置df['bill_class']
。
我收到以下错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我认为if条件正在正确评估,但错误是由于为新的列分配了for循环计数器的值。
任何人都可以解释出现问题或提出替代方案。
答案 0 :(得分:2)
要避免ValueError
,请更改
if (lower_bound <= df['BILL_AMT1'] < upper_bound):
df['bill_class'] = i
到
mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)
df.loc[mask, 'bill_class'] = i
chained comparison (lower_bound <= df['BILL_AMT1'] < upper_bound)
相当于
(lower_bound <= df['BILL_AMT1']) and (df['BILL_AMT1'] < upper_bound)
and
运算符导致在布尔上下文中计算两个布尔系列(lower_bound <= df['BILL_AMT1'])
,(df['BILL_AMT1'] < upper_bound)
- 即减少为单个布尔值。 Pandas refuses to reduce系列为一个布尔值。
相反,要返回布尔系列,请使用&
运算符代替and
:
mask = (lower_bound <= df['BILL_AMT1']) & (df['BILL_AMT1'] < upper_bound)
然后将bill_class
列为mask
列的df.loc
列分配值,使用df.loc[mask, 'bill_class'] = i
:
df['BILL_AMT1']
要将for-loop
中的数据分区,您可以完全删除Python pd.cut
,而DSM suggests使用df['bill_class'] = pd.cut(df['BILL_AMT1'], bins=10, labels=False)+1
:
SELECT SUM(products.price) FROM boughtProducts, products WHERE boughtProducts.userid = :username and products.id = boughtProducts.productId
答案 1 :(得分:0)
IIUC,这应该是对您的代码的修复:
mx, mn = df['BILL_AMT1'].max(), df['BILL_AMT1'].min()
rng = mx - mn
a = 10
for i in range(a):
inc = rng / a
lower_bound = mn + i * inc
print('Lower bound is ' + str(lower_bound))
upper_bound = mn + (i + 1) * inc if i + 1 < a else mx
print('Upper bound is ' + str(upper_bound))
ge = df['BILL_AMT1'].ge(lower_bound)
lt = df['BILL_AMT1'].lt(upper_bound)
df.loc[ge & lt, 'bill_class'] = i
<强> 然而 强>
我这样做
df['bill_class'] = pd.qcut(df['BILL_AMT1'], 10, list(range(10)))