Question

数据框中的第一列是随机的studentID列表。我想知道是否有两次发生的学生ID。如果是这种情况，我想打印出它发生的两行。

StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123579     Marie

我想打印出来：

"Two students have the same student id in line {} and {}"

Answer 1

df = df.reset_index()  # So a row value is visible after the groupby

# Check how the df looks
print(df)
   index StudentID     Name
0      0   s123456  Michael
1      1   s123789    Peter
2      2   s123789   Thomas
3      3   s123579    Marie

def my_func(x):
    count = len(x)
    rows = " and ".join(x.astype(str))
    return "{} students have the same student ID in line {}".format(count, rows)

df = df[df.StudentID.duplicated(False)].groupby('StudentID')['index'].unique().map(my_func)

# Print results
for i in df:
    print(i)

2 students have the same student ID in line 1 and 2

Answer 2

这是使用f-strings的一种方法，可在Python 3.6 +中找到：

# example data
StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123577     Joe
s123456     Mark
s123458     Andrew

# get duplicates StudentIDs
dups = df.loc[df['StudentID'].duplicated(keep=False), 'StudentID'].unique()

# iterate duplicates
for stid in dups:
    dup_idx = df[df['StudentID'] == stid].index.tolist()
    print(f'{len(dup_idx)} Students have the same student id in lines: {dup_idx}')

2 Students have the same student id in lines: [0, 4]
2 Students have the same student id in lines: [1, 2]

如果在数据帧中出现两次相同的元素？

2 个答案: