Question

我正在尝试检查某个房客是否在指定的年龄段之间，然后为它分配一个二进制变量，无论是否存在。

我尝试在专栏上使用lambda函数：

df['<= 18'] = df['rider_age'].apply(lambda x: 1 if x <= 18 else 0)
df['19-24'] = df['rider_age'].apply(lambda x: 1 if x <=19 & x >=24 else 0)
df['25-35'] = df['rider_age'].apply(lambda x: 1 if x <=25 & x >=35 else 0)
df['36-50'] = df['rider_age'].apply(lambda x: 1 if x <=36 & x >=50 else 0)
df['51-59'] = df['rider_age'].apply(lambda x: 1 if x <=51 & x >=59 else 0)
df['51-59'] = df['rider_age'].apply(lambda x: 1 if x <=51 & x >=59 else 0)
df['60+'] =   df['rider_age'].apply(lambda x: 1 if x >=60 else 0)

现在这适用于18岁以下，60岁以上的人群，但它只是将相应年龄段的年龄标记为0，因此应将其标记为1。

有人对如何工作有任何想法吗？

Answer 1

我只修复您的代码。这些中间命令的条件错误。您需要=> & <=而不是<= & >=。第二件事，您需要将每个条件包装在括号中，如下所示：

df['<= 18'] = df['rider_age'].apply(lambda x: 1 if x <= 18 else 0)
df['19-24'] = df['rider_age'].apply(lambda x: 1 if (x >=19) & (x <= 24) else 0)
df['25-35'] = df['rider_age'].apply(lambda x: 1 if (x >=25) & (x <= 35) else 0)
df['36-50'] = df['rider_age'].apply(lambda x: 1 if (x >=36) & (x <= 50) else 0)
df['51-59'] = df['rider_age'].apply(lambda x: 1 if (x >=51) & (x <= 59) else 0)
df['60+'] =   df['rider_age'].apply(lambda x: 1 if x >=60 else 0)

Answer 2

您可以使用cut + get_dummies

进行检查

s=pd.cut(df['rider_age'],[-np.Inf,18,24,35,50,59,np.Inf]).astype(str).str.get_dummies()

然后concat返回

df=pd.concat([df,s], axis=1)

通过将条件从＆更改为和

来修复代码

apply(lambda x: 1 if x <=19 and x >=24 else 0)

Answer 3

您混合了一些不平等...

看看第二种情况：x必须为<= 19 AND> = 24。您必须表示> = 19和<= 24，对吧？

检查条件是否在两个值之间并分配伪变量

3 个答案: