我有this数据集:
Customer_ID Gender First_Date First_region First_state First_city \
0 129609144 M 20130130 West Gujarat Surat
1 129627580 M 20130129 North Delhi Delhi
2 130363481 M 20130221 West Gujarat Surat
3 49817480 M 20130222 West Maharashtra Pimpri-Chinchwad
4 126343829 F 20130301 North Delhi Delhi
Recent_Date Last_region Last_state Last_city Customer_Value \
0 20130216 West Gujarat Surat 2032.0
1 20130129 North Delhi Delhi 1709.0
2 20130221 West Gujarat Surat 523.0
3 20130222 West Maharashtra Pimpri-Chinchwad 5132.0
4 20130301 North Delhi Delhi 1008.0
Buy_Times Points_Earned Points_Redeemed
0 2 200.0 0.0
1 1 100.0 0.0
2 1 10.0 0.0
3 1 170.0 0.0
4 1 60.0 0.0
我正在尝试创建一个新的列名称“客户价值细分”,但我想基于“ Customer_Value”列的值来分配此列中的值。
所以
我尝试过这种方法:
df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))
但是,没有运气。它引发以下错误:
ValueError Traceback (most recent call last)
<ipython-input-48-fee1062f32ba> in <module>
----> 1 df['customer value segment'] = np.where(df['Customer_Value'] > 25000, 'High Value Segment', np.where(10000 > df['Customer_Value'] > 25000, 'Medium Value Segment', np.where(df['Customer_Value'] <= 10000, 'Low Value Segment', 'None')))
~\Anaconda3\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
1476 raise ValueError("The truth value of a {0} is ambiguous. "
1477 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1478 .format(self.__class__.__name__))
1479
1480 __bool__ = __nonzero__
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我现在应该如何处理?
注意-万一您想读取实际的数据集,这就是我的方法:
df = pd.read_csv('Customers.csv', encoding='unicode_escape')
答案 0 :(得分:2)
这应该有效:
df.loc[df['Customer_Value'] > 25000, 'customer value segment'] = 'High Value Segment'
df.loc[(df['Customer_Value'] >= 10000) & (df['Customer_Value'] <= 25000) , 'customer value segment'] = 'Medium Value Segemnt '
df.loc[df['Customer_Value'] < 10000, 'customer value segment'] = 'Low Value Segment '
答案 1 :(得分:1)
嗯
np说它想要一个类似数组的对象,您是否尝试过使用数组而不是df进行操作? 另外,where函数中的第二个参数应该是数组,而不是字符串。 我只是在猜测字符串会带来麻烦。尝试将其放在方括号中。
但是我实际上只是遍历数据帧并检查是否或切换情况。
newCol = []
for ind in df.index:
if df['Customer_Value'][ind] > 25000:
newCol.append('High Value Segment')
else if 10000 > df['Customer_Value'][ind] > 25000:
newCol.append('Medium Value Segment')
else:
newCol.append('Low Value Segment')
,然后追加数组。我在这里写了它,所以空白可能效果不佳,您必须在编辑器中修复它们。让我知道它是否有效。
答案 2 :(得分:1)
尝试以下列表理解:
df["customer value segment"] = ["High Value Segment" if x>25000 else "Medium Value Segement" if x>10000 else "Low Value Segment" for x in df["Customer_Value"]]