数据框中的第一列是随机的studentID列表。我想知道是否有两次发生的学生ID。如果是这种情况,我想打印出它发生的两行。
StudentID Name
s123456 Michael
s123789 Peter
s123789 Thomas
s123579 Marie
我想打印出来:
"Two students have the same student id in line {} and {}"
答案 0 :(得分:1)
df = df.reset_index() # So a row value is visible after the groupby
# Check how the df looks
print(df)
index StudentID Name
0 0 s123456 Michael
1 1 s123789 Peter
2 2 s123789 Thomas
3 3 s123579 Marie
def my_func(x):
count = len(x)
rows = " and ".join(x.astype(str))
return "{} students have the same student ID in line {}".format(count, rows)
df = df[df.StudentID.duplicated(False)].groupby('StudentID')['index'].unique().map(my_func)
# Print results
for i in df:
print(i)
2 students have the same student ID in line 1 and 2
答案 1 :(得分:0)
这是使用f-strings的一种方法,可在Python 3.6 +中找到:
# example data
StudentID Name
s123456 Michael
s123789 Peter
s123789 Thomas
s123577 Joe
s123456 Mark
s123458 Andrew
# get duplicates StudentIDs
dups = df.loc[df['StudentID'].duplicated(keep=False), 'StudentID'].unique()
# iterate duplicates
for stid in dups:
dup_idx = df[df['StudentID'] == stid].index.tolist()
print(f'{len(dup_idx)} Students have the same student id in lines: {dup_idx}')
2 Students have the same student id in lines: [0, 4]
2 Students have the same student id in lines: [1, 2]