我有来自Table_Record
的以下数据集:
Seg_ID Lock_ID Code
111 100 1
222 121 2
333 341 2
444 100 1
555 100 1
666 341 2
777 554 4
888 332 5
我正在使用sql查询来找到Seg_IDs
重复的Lock_ID
:
Select Code,Lock_ID,Seg_ID from Table_Record group by Code, Lock_ID;
Seg_ID Lock_ID Code
111 100 1
444 100 1
555 100 1
222 121 2
333 341 2
666 341 2
777 554 4
888 332 5
如何使用Pandas实现相同的目标?
Excepted O/P from Pandas is:
例如
Seg_ID (111,444,555) has Lock_id (1).
Seg_ID (222,333,666) has Lock_ID (2).
答案 0 :(得分:2)
首先通过仅过滤duplicated
值来获取所有codes
,然后用boolean indexing
用isin
过滤原始DaatFrame
:
codes = df.loc[df.duplicated(['Lock_ID']), 'Code'].unique()
df1 = df[df['Code'].isin(codes)]
print (df1)
Seg_ID Lock_ID Code
0 111 100 1
1 222 121 2
2 333 341 2
3 444 100 1
4 555 100 1
5 666 341 2
然后将groupby
与f-string
s:
for k, v in df1.groupby(['Code'])['Seg_ID']:
print (f'Seg_ID {tuple(v)} has Code ({k})')
Seg_ID (111, 444, 555) has Code (1)
Seg_ID (222, 333, 666) has Code (2)
如果要输出类似DataFrame
的内容,请在apply
中使用tuple
:
df2 = df1.groupby(['Code'])['Seg_ID'].apply(tuple).reset_index()
print (df2)
Code Seg_ID
0 1 (111, 444, 555)
1 2 (222, 333, 666)
答案 1 :(得分:0)
只需使用groupby。从您的代码可以理解,您想要:
grouped= df.groupby(['Code']['LockId'])