Question

我在pyodbc和python中使用SQL Server。我想创建列表，字典或panda df等的列表，这些列表的行的唯一ID在多列中具有重复的值。对于前。我有这样一张桌子：

 ID          size      page       rate
12345         6         12         20  
67890         6         12         20
23456         4         10         15
87654         4         10         15
43210         4         10         15
....

第1-2行和第3-5行的列大小，页面和费率重复。所以我需要像这样将ID分组在一起：（例如列表列表：）

duplicates = [[12345, 67890], [23456, 87654, 43210],...]

执行光标后，我得到第一个表作为结果：

duplicates =[]
row1 = [row[1] for row in cursor]
row2 = [row[1] for row in cursor]
counter = 1
index = 0
for row in cursor:
   if index <= len(row1)-2:
    n0 = row1[index]
    n1 = row2[index]
    n2 = row1[index+1]
    n3 = row2[index+1]
    if n0 == n1 and n2 == n3:
        duplicates.append(row[0])
    else:
        counter+=1
    index+=1
else: break

不起作用，但是任何帮助和指导将不胜感激！谢谢！

Answer 1

您可以将查询结果转换为pandas数据框（假设名称为“ df”）。之后，您可以使用以下行提取重复的ID。

df1 = df.groupby('size')['ID'].apply(list).reset_index(name='duplicates')

在其他列中列出具有重复值的唯一ID

1 个答案: