根据添加的其他三个结果在数据框中创建新列

时间:2017-08-18 14:16:48

标签: python pandas conditional where

我制作了以下代码:

data['Customer_segment'] = np.where(((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=5,1),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>5 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=8,2),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>8 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=11,3),
np.where((data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])>11 & (data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment'])<=14,4),5)

我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

非常感谢帮助达到最佳解决方案,我觉得我想要做的那个可能不是最佳的。

输入示例如下:

MOVC % segment  order_size_seg  Order frequency segment
1                      2                 3
5                      2                 1
5                      5                 5

我正在尝试根据每行求和的结果添加一列,如下所示:

如果3-5然后1 如果6-8然后2 如果9-11然后3 如果12-14然后4 如果15+然后5

真的有助于此

3 个答案:

答案 0 :(得分:3)

我认为您需要多个np.where一个numpy.select

#only once sum values 
a = data['Order frequency segment']+data['order_size_seg']+data['MOVC % segment']
#conditions with ()
m1 = a<=5
m2 = (a>5) & (a<=8)
m3 = (a>8) & (a<=11)
m4 = (a>11) & (a<=14)

data['Customer_segment'] = np.select([m1, m2, m3, m4],[1,2,3,4], default=5)

另一种解决方案是使用cut

bins = [-np.inf,5,8,11,14, np.inf]
labels = [1,2,3,4,5]

data['Customer_segment'] = pd.cut(df['B'], bins=bins, labels=labels)

答案 1 :(得分:2)

尝试df <- data.frame(x = factor(rep(c("a","b","c","d"),4), levels = c("a","b","c","d")), y = rep(seq(1,8,1),2), z = c(rep("x",4),rep("y",4)), facet = rep(c(rep("1",2),rep("2",2)),4)) ggplot(NULL) + geom_line(data=df, aes(y=y, x=x, linetype=z, group = z, colour=z)) + facet_grid(~facet, scales="free_x", space="free_x") + scale_colour_manual(values=c("Red","Blue")) + scale_linetype_manual(values=c("solid", "dashed"))

怎么样?
pd.cut

答案 2 :(得分:1)

query方法怎么样?它似乎有非常强大的语法:

import pandas as pd
d = pd.DataFrame([[1,2,3],[5,2,1],[5,5,5]], columns=['M','O','F'])
d.query("5 < M+O+F < 8")

Out[4]: 
   M  O  F
1  5  2  1