Question

我有一个从csv文件构建的数据框，该文件包含StudentID，名称和作业1,2,3 ... csv文件将作为输入输入，因此值可能会有所不同。

如果学生证不是唯一的，我想打印错误消息列表。下面的代码可以正常工作，因为gradesM3.csv中没有重复项：

        grades = pd.read_csv('gradesM3.csv',sep=';')
        duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1)      
        zipped = zip(duplicates['StudentID'])
        for student in zipped:
            print(f'The student ID {student} appears multiple times.')

但是，如果我更改CSV文件并创建一些重复的学生ID，则会收到以下错误：

ValueError: No objects to concatenate

我正在尝试编写一个代码，如果有重复则打印以下内容：

The student ID ('s123789',) appears multiple times.

The student ID ('s123789',) appears multiple times.

The student ID ('s123789',) appears multiple times.

如果没有，则显示以下内容：

There are no duplicates in your file.

我尝试了以下代码：

        grades = pd.read_csv('gradesM3.csv',sep=';')
        duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1) 
        if len(duplicates)>0:
            zipped = zip(duplicates['StudentID'])
            for student in zipped:
                print(f'The student ID {student} appears multiple times.')
        else:
            print('The grades are correctly scaled along the 7-point grading system.')

但是我得到了相同的错误消息：

ValueError: No objects to concatenate.

在此先感谢您的帮助。

Answer 1

您的问题是您的错误来自以下行：

duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1)

由于您在该行之后管理空情况，因此仍会发生错误。一种解决方案是使用try except语法：

grades = pd.read_csv('gradesM3.csv',sep=';')
try:
    duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1)
    zipped = zip(duplicates['StudentID'])
    for student in zipped:
        print(f'The student ID {student} appears multiple times.')
except ValueError:
    print('The grades are correctly scaled along the 7-point grading system.')

Answer 2

一个更直接的解决方案是使用duplicated大熊猫方法

import pandas as pd

# Example data
df = pd.DataFrame({'id' : [1,2,2,4, 5, 1], 'name' : ["a", "b", "b", "d", "e", "a"]})
print(df)

#   id name
#0   1    a
#1   2    b
#2   2    b
#3   4    d
#4   5    e
#5   1    a

# Get the duplicates - each df row where th eid column is duplicated
df_duplicates  = df[df['id'].duplicated()]

for id in df_duplicates['id']:
    print(f"Student {id} is a duplicate")


#Student 2 is a duplicate
#Student 1 is a duplicate

仅当有对象时

2 个答案: