假设我有这个pandas数据框,
df
street_id district_id region_id value1 value2
1 6 8 7 5
1 5 8 9 3
2 6 5 8 0
2 6 5 6 2
3 4 8 5 1
3 7 9 0 2
预期输出是,
street_id district_id region_id
2 6 5
3 4 8
3 7 9
我想只选择区域内唯一的街道记录。我无法找到street_id& amp;的独特之处。 region_id,因为我也需要district_id。我怎么能这样做?
这里街道的独特性由一个区域内仅有一个区域的街道定义。
答案 0 :(得分:2)
IIUC:
In [15]: df.assign(x=df.groupby(['region_id','street_id'])['district_id']
.transform('nunique')) \
...: .query("x == 1") \
...: .drop_duplicates(subset=['street_id','region_id']) \
...: .drop('x',1)
Out[15]:
street_id district_id region_id value1 value2
2 2 6 5 8 0
4 3 4 8 5 1
5 3 7 9 0 2
或proposed by @Zero更好更短的版本:
df[df.groupby(['region_id','street_id'])['district_id']
.transform('nunique').eq(1)] \
.drop_duplicates(subset=['street_id','region_id'])
分解:
In [16]: df.groupby(['region_id','street_id'])['district_id'].transform('nunique')
Out[16]:
0 2
1 2
2 1
3 1
4 1
5 1
Name: district_id, dtype: int64
In [17]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique'))
Out[17]:
street_id district_id region_id value1 value2 x
0 1 6 8 7 5 2
1 1 5 8 9 3 2
2 2 6 5 8 0 1
3 2 6 5 6 2 1
4 3 4 8 5 1 1
5 3 7 9 0 2 1
In [18]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique')) \
...: .query("x == 1") \
...:
Out[18]:
street_id district_id region_id value1 value2 x
2 2 6 5 8 0 1
3 2 6 5 6 2 1
4 3 4 8 5 1 1
5 3 7 9 0 2 1
In [19]: df.assign(x=df.groupby(['region_id','street_id'])['district_id'].transform('nunique')) \
...: .query("x == 1") \
...: .drop_duplicates(subset=['street_id','region_id']) \
...:
Out[19]:
street_id district_id region_id value1 value2 x
2 2 6 5 8 0 1
4 3 4 8 5 1 1
5 3 7 9 0 2 1