在熊猫中创建条件列

时间:2019-05-09 11:03:34

标签: python-3.x pandas numpy where pandas-groupby

我正在尝试在熊猫中创建条件列。这是数据框的外观。

$input

如您所见,我的数据显示了狗及其主人。我们也知道狗是否蓬松。我想创建两列 data = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1}, {"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0}, {"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0}, {"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0}, {"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1}] df = pd.DataFrame(data) fluffy_dogs_owned

我正在寻找的结果是:

owner_has_fluffy_dog

我曾考虑过使用data_result = [{"owner" : "john", "dog" : 'magie', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1}, {"owner" : "john", "dog" : 'stellar', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1}, {"owner" : "lisa", "dog" : 'mollie' , "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0}, {"owner" : "lisa", "dog" : 'rex', "dog_is_fluffy" : 0, "fluffy_dogs_owned" : 0, "owner_has_fluffy_dog" : 0}, {"owner" : "john", "dog" : 'luns', "dog_is_fluffy" : 1, "fluffy_dogs_owned" : 2, "owner_has_fluffy_dog" : 1}] df_result = pd.DataFrame(data_result) df.groupby(),但到目前为止我还无法使用。有任何想法吗?

1 个答案:

答案 0 :(得分:2)

使用GroupBy.transform来返回Series,其大小与带有sum的原始数据帧相同,然后比较不等于Series.ne的列并转换为整数

df['fluffy_dogs_owned'] = df.groupby('owner')['dog_is_fluffy'].transform('sum')
df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].ne(0).astype(int)

或使用Series.clip

df['owner_has_fluffy_dog'] = df['fluffy_dogs_owned'].clip(upper=1)

print (df)
       dog  dog_is_fluffy owner  fluffy_dogs_owned  owner_has_fluffy_dog
0    magie              1  john                  2                     1
1  stellar              0  john                  2                     1
2   mollie              0  lisa                  0                     0
3      rex              0  lisa                  0                     0
4     luns              1  john                  2                     1