我有以下df
,
cluster_id dummy
1 False
1 True
1 True
2 False
2 False
3 False
3 True
我想创建一个布尔列'dummy_display',如果每个群集中至少有一个False
并且dummy == True
的数量少于以下,则将其设置为True
簇的长度,所以结果应该看起来像
cluster_id dummy dummy_display
1 False False
1 True False
1 True False
2 False True
2 False True
3 False False
3 True False
答案 0 :(得分:3)
将transform
与any
一起使用
In [137]: ~df.groupby('cluster_id')['dummy'].transform('any')
Out[137]:
0 False
1 False
2 False
3 True
4 True
5 False
6 False
Name: dummy, dtype: bool
In [139]: df['dummy_display'] = ~df.groupby('cluster_id')['dummy'].transform('any')
In [140]: df
Out[140]:
cluster_id dummy dummy_display
0 1 False False
1 1 True False
2 1 True False
3 2 False True
4 2 False True
5 3 False False
6 3 True False
答案 1 :(得分:2)
@Zero的答案比较简单,应该是goto方法。但是我忍不住要提供一个Numpy替代方案。
i, u = pd.factorize(df.cluster_id)
a = np.zeros(len(u), np.bool8)
np.logical_or.at(a, i, df.dummy.values)
df.assign(dummpy_display=a[i])
cluster_id dummy dummpy_display
0 1 False True
1 1 True True
2 1 True True
3 2 False False
4 2 False False
5 3 False True
6 3 True True
pandas.factorize
创建一个整数数组,这些整数表示df.cluster_id
中的唯一值
i, u = pd.factorize(df.cluster_id)
print(f"factorization (i): {[*i]}\nunique values (u): {[*u]}")
factorization (i): [0, 0, 0, 1, 1, 2, 2]
unique values (u): [1, 2, 3]
然后我们为每个唯一的False
初始化cluster_id
值
a = np.zeros(len(u), np.bool8)
print(f"accumulated `or` init (a): {[*a]}")
accumulated `or` init (a): [False, False, False]
然后使用np.logical_or.at
函数通过给定指定索引和布尔值的or
逻辑进行累加
np.logical_or.at(a, i, df.dummy.values)
print(f"accumulated `or` post (a): {[*a]}")
print(f"broadcast over factorization (a[i]):\n {[*a[i]]}")
accumulated `or` post (a): [True, False, True]
broadcast over factorization (a[i]):
[True, True, True, False, False, True, True]
让我们更深入地了解。我将进行遍历并显示分组累积变量a
a = [False, False, False]
print(f"accumulate `or` init (a): {a}", end='\n\n')
d = df.assign(i=i, a=None)[['cluster_id', 'i', 'dummy', 'a']]
for j in d.index:
a[d.at[j, 'i']] |= d.at[j, 'dummy']
d.at[j, 'a'] = [*a]
d
cluster_id i dummy a
at ↓ ⇩ or a[0] ⇩
0 1 0 False [False, False, False]
╭──────────⤴
at ↓ ⇩ or a[0] == ⇩
1 1 0 True [True, False, False]
╭──────────⤴
at ↓ ⇩ or a[0] == ⇩
2 1 0 True [True, False, False]
╭─────────────────⤴
at ↓ ⇩ or a[1] == ⇩
3 2 1 False [True, False, False]
╭─────────────────⤴
at ↓ ⇩ or a[1] == ⇩
4 2 1 False [True, False, False]
╭────────────────────────⤴
at ↓ ⇩ or a[2] == ⇩
5 3 2 False [True, False, False]
╭────────────────────────⤴
at ↓ ⇩ or a[2] == ⇩
6 3 2 True [True, False, True]
与上面显示的广播相同
print(f"result (a): {a}\nbroadcasted (a[i]):\n {[a[j] for j in i]}")
result (a): [True, False, True]
broadcasted (a[i]):
[True, True, True, False, False, True, True]