使用多列组合查找pandas中的唯一记录

时间:2017-09-21 06:22:13

标签: python pandas unique

假设我有这个pandas数据框,

df

street_id district_id region_id value1 value2 
   1          6          8        7      5
   1          5          8        9      3
   2          6          5        8      0
   2          6          5        6      2
   3          4          8        5      1
   3          7          9        0      2

预期输出是,

street_id district_id region_id
   2          6          5            
   3          4          8       
   3          7          9   

我想只选择区域内唯一的街道记录。我无法找到street_id& amp;的独特之处。 region_id,因为我也需要district_id。我怎么能这样做?

这里街道的独特性由一个区域内仅有一个区域的街道定义。

1 个答案:

答案 0 :(得分:2)

IIUC:

In [15]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'] 
                       .transform('nunique')) \
    ...:   .query("x == 1") \
    ...:   .drop_duplicates(subset=['street_id','region_id']) \
    ...:   .drop('x',1)
Out[15]:
   street_id  district_id  region_id  value1  value2
2          2            6          5       8       0
4          3            4          8       5       1
5          3            7          9       0       2

proposed by @Zero更好更短的版本:

df[df.groupby(['region_id','street_id'])['district_id']
     .tran‌​sform('nunique').eq(‌​1)] \
  .drop_duplicates(‌​subset=['street_id',‌​'region_id'])

分解:

In [16]: df.groupby(['region_id','street_id'])['district_id'].transform('nunique')
Out[16]:
0    2
1    2
2    1
3    1
4    1
5    1
Name: district_id, dtype: int64

In [17]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique'))
Out[17]:
   street_id  district_id  region_id  value1  value2  x
0          1            6          8       7       5  2
1          1            5          8       9       3  2
2          2            6          5       8       0  1
3          2            6          5       6       2  1
4          3            4          8       5       1  1
5          3            7          9       0       2  1

In [18]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique')) \
    ...:   .query("x == 1") \
    ...:
Out[18]:
   street_id  district_id  region_id  value1  value2  x
2          2            6          5       8       0  1
3          2            6          5       6       2  1
4          3            4          8       5       1  1
5          3            7          9       0       2  1

In [19]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique')) \
    ...:   .query("x == 1") \
    ...:   .drop_duplicates(subset=['street_id','region_id']) \
    ...:
Out[19]:
   street_id  district_id  region_id  value1  value2  x
2          2            6          5       8       0  1
4          3            4          8       5       1  1
5          3            7          9       0       2  1