pandas如何将组ID分配给大小为>的组。 1

时间:2018-04-25 10:10:20

标签: python-3.x pandas dataframe

我想在groupby上执行df,然后为每个组分配一个ID,其大小为> 1;

df_gr = df.groupby(['a', 'b', 'c'])

df_filtered = df_gr.filter(lambda x: len(x) > 1)

if df_filtered.shape[0] == 0:
   df_filtered['id'] = -1
else:
   # put ids in df_filtered

我想知道该怎么做。

a    b        c        d      
10   2017     20.0     231    
10   2017     20.0     223    
20   2018     10.0     113    
30   2017     11.0     134    
30   2017     11.0     112    
30   2017     11.0     111    

结果df,

 a    b        c        d      id
10   2017     20.0     231     1
10   2017     20.0     223     1
30   2017     11.0     134     2
30   2017     11.0     112     2
30   2017     11.0     111     2

if df_filtered.shape[0] != 0:
   df_filtered["id"] = df_filtered.groupby(
                ['a', 'b', 'c']).grouper.group_info[0]

1 个答案:

答案 0 :(得分:1)

我认为transform需要numpy.where

df['id'] = np.where(df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1, -1, 2)
print (df)
    a     b     c    d  id
0  10  2017  20.0  231  -1
1  10  2017  20.0  223  -1
2  20  2018  10.0  113   2
3  30  2017  11.0  134  -1
4  30  2017  11.0  112  -1
5  30  2017  11.0  111  -1

如果想要10值,则另一个解决方案是将布尔掩码强制转换为integer s:

df['id'] = np.where(df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1, 1, 0)
df['id'] = (df.groupby(['a', 'b', 'c'])['a'].transform('size') > 1).astype(int)
print (df)
    a     b     c    d  id
0  10  2017  20.0  231   1
1  10  2017  20.0  223   1
2  20  2018  10.0  113   0
3  30  2017  11.0  134   1
4  30  2017  11.0  112   1
5  30  2017  11.0  111   1

编辑我认为需要GroupBy.ngroup

#create values by size of columns
df['id'] = df.groupby(['a', 'b', 'c'])['a'] .transform('size')

#filter out rows
df = df[df['id'] > 1]
#sequencial id values
df['id'] = df.groupby(['a', 'b', 'c'])['a'].ngroup() + 1
    a     b     c    d  id
0  10  2017  20.0  231   1
1  10  2017  20.0  223   1
3  30  2017  11.0  134   2
4  30  2017  11.0  112   2
5  30  2017  11.0  111   2