Question

提供了以下数据：

x1 = 'one'
x2 = 'two'
x3 = 'three'
y1 = 'yes'
y2 = 'no'
n = 3


df = pd.DataFrame(dict(
    a = [x1]*n + [x2]*n + [x3]*n,
    b = [
        y1,
        y1,
        y2,
        y2,
        y2,
        y2,
        y2,
        y2,
        y1,
    ]
))

其外观为：

Out[5]:
       a    b
0    one  yes
1    one  yes
2    one   no
3    two   no
4    two   no
5    two   no
6  three   no
7  three   no
8  three  yes

我想知道是否可以如下创建列c：

Out[5]:
       a    b   c
0    one  yes   1
1    one  yes   1
2    one   no   1
3    two   no   0
4    two   no   0
5    two   no   0
6  three   no   1
7  three   no   1
8  three  yes   1

如果对于c中的组，1包含a，则b被定义为yes

我尝试了以下操作：

group_results = df.groupby('a').apply(lambda x:  'yes' in x.b.to_list() )
group_results = group_results.reset_index()
group_results = group_results.rename(columns = {0 : 'c'})
df = pd.merge(df, group_results, left_on = 'a', 
                  right_on = 'a', 
                  how = 'left').copy()

但我觉得似乎有更好的方法。

Answer 1

IIUC，在对条件序列进行分组后，可以将Groupby+transform与any一起使用，该条件序列检查df['b'] equals 'yes'是否链接{{1} }或view代表整数。

astype(int)

df['c'] = df['b'].eq('yes').groupby(df['a']).transform('any').view('i1')
print(df)

Answer 2

将Series.isin用于在yes列中具有至少一个a的测试组，最后使用Series.view将掩码转换为整数：

df['c'] = df['a'].isin(df.loc[df['b'].eq('yes'), 'a']).view('i1')
print(df)
       a    b  c
0    one  yes  1
1    one  yes  1
2    one   no  1
3    two   no  0
4    two   no  0
5    two   no  0
6  three   no  1
7  three   no  1
8  three  yes  1

详细信息：

print(df.loc[df['b'].eq('yes'), 'a'])
0      one
1      one
8    three
Name: a, dtype: obje

如何创建具有基于groupby值的熊猫数据帧向量

2 个答案: