如果在数据帧中出现两次相同的元素?

时间:2018-06-15 13:10:33

标签: python python-3.x pandas dataframe

数据框中的第一列是随机的studentID列表。我想知道是否有两次发生的学生ID。如果是这种情况,我想打印出它发生的两行。

StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123579     Marie

我想打印出来:

"Two students have the same student id in line {} and {}"

2 个答案:

答案 0 :(得分:1)

df = df.reset_index()  # So a row value is visible after the groupby

# Check how the df looks
print(df)
   index StudentID     Name
0      0   s123456  Michael
1      1   s123789    Peter
2      2   s123789   Thomas
3      3   s123579    Marie

def my_func(x):
    count = len(x)
    rows = " and ".join(x.astype(str))
    return "{} students have the same student ID in line {}".format(count, rows)

df = df[df.StudentID.duplicated(False)].groupby('StudentID')['index'].unique().map(my_func)

# Print results
for i in df:
    print(i)

2 students have the same student ID in line 1 and 2

答案 1 :(得分:0)

这是使用f-strings的一种方法,可在Python 3.6 +中找到:

# example data
StudentID   Name
s123456     Michael
s123789     Peter
s123789     Thomas 
s123577     Joe
s123456     Mark
s123458     Andrew

# get duplicates StudentIDs
dups = df.loc[df['StudentID'].duplicated(keep=False), 'StudentID'].unique()

# iterate duplicates
for stid in dups:
    dup_idx = df[df['StudentID'] == stid].index.tolist()
    print(f'{len(dup_idx)} Students have the same student id in lines: {dup_idx}')

2 Students have the same student id in lines: [0, 4]
2 Students have the same student id in lines: [1, 2]