我想在groupby
上执行df
,然后为每个组分配一个ID,其大小为> 1;
df_gr = df.groupby(['a', 'b', 'c'])
df_filtered = df_gr.filter(lambda x: len(x) > 1)
if df_filtered.shape[0] == 0:
df_filtered['id'] = -1
else:
# put ids in df_filtered
我想知道该怎么做。
a b c d
10 2017 20.0 231
10 2017 20.0 223
20 2018 10.0 113
30 2017 11.0 134
30 2017 11.0 112
30 2017 11.0 111
结果df,
a b c d id
10 2017 20.0 231 1
10 2017 20.0 223 1
30 2017 11.0 134 2
30 2017 11.0 112 2
30 2017 11.0 111 2
if df_filtered.shape[0] != 0:
df_filtered["id"] = df_filtered.groupby(
['a', 'b', 'c']).grouper.group_info[0]
答案 0 :(得分:1)
我认为transform
需要numpy.where
:
df['id'] = np.where(df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1, -1, 2)
print (df)
a b c d id
0 10 2017 20.0 231 -1
1 10 2017 20.0 223 -1
2 20 2018 10.0 113 2
3 30 2017 11.0 134 -1
4 30 2017 11.0 112 -1
5 30 2017 11.0 111 -1
如果想要1
和0
值,则另一个解决方案是将布尔掩码强制转换为integer
s:
df['id'] = np.where(df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1, 1, 0)
df['id'] = (df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1).astype(int)
print (df)
a b c d id
0 10 2017 20.0 231 1
1 10 2017 20.0 223 1
2 20 2018 10.0 113 0
3 30 2017 11.0 134 1
4 30 2017 11.0 112 1
5 30 2017 11.0 111 1
编辑我认为需要GroupBy.ngroup
:
#create values by size of columns
df['id'] = df.groupby(['a', 'b', 'c'])['a'] .transform('size')
#filter out rows
df = df[df['id'] > 1]
#sequencial id values
df['id'] = df.groupby(['a', 'b', 'c'])['a'].ngroup() + 1
a b c d id
0 10 2017 20.0 231 1
1 10 2017 20.0 223 1
3 30 2017 11.0 134 2
4 30 2017 11.0 112 2
5 30 2017 11.0 111 2