仅当有对象时

时间:2019-01-24 14:45:22

标签: python pandas dataframe

我有一个从csv文件构建的数据框,该文件包含StudentID,名称和作业1,2,3 ... csv文件将作为输入输入,因此值可能会有所不同。

如果学生证不是唯一的,我想打印错误消息列表。下面的代码可以正常工作,因为gradesM3.csv中没有重复项:

        grades = pd.read_csv('gradesM3.csv',sep=';')
        duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1)      
        zipped = zip(duplicates['StudentID'])
        for student in zipped:
            print(f'The student ID {student} appears multiple times.')

但是,如果我更改CSV文件并创建一些重复的学生ID,则会收到以下错误:

ValueError: No objects to concatenate

我正在尝试编写一个代码,如果有重复则打印以下内容:

The student ID ('s123789',) appears multiple times.

The student ID ('s123789',) appears multiple times.

The student ID ('s123789',) appears multiple times.

如果没有,则显示以下内容:

There are no duplicates in your file. 

我尝试了以下代码:

        grades = pd.read_csv('gradesM3.csv',sep=';')
        duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1) 
        if len(duplicates)>0:
            zipped = zip(duplicates['StudentID'])
            for student in zipped:
                print(f'The student ID {student} appears multiple times.')
        else:
            print('The grades are correctly scaled along the 7-point grading system.')

但是我得到了相同的错误消息:

ValueError: No objects to concatenate. 

在此先感谢您的帮助。

2 个答案:

答案 0 :(得分:1)

您的问题是您的错误来自以下行:

duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1) 

由于您在该行之后管理空情况,因此仍会发生错误。一种解决方案是使用try except语法:

grades = pd.read_csv('gradesM3.csv',sep=';')
try:
    duplicates = pd.concat(g for _, g in grades.groupby("StudentID") if len(g) > 1)
    zipped = zip(duplicates['StudentID'])
    for student in zipped:
        print(f'The student ID {student} appears multiple times.')
except ValueError:
    print('The grades are correctly scaled along the 7-point grading system.')

答案 1 :(得分:1)

一个更直接的解决方案是使用duplicated大熊猫方法

import pandas as pd

# Example data
df = pd.DataFrame({'id' : [1,2,2,4, 5, 1], 'name' : ["a", "b", "b", "d", "e", "a"]})
print(df)

#   id name
#0   1    a
#1   2    b
#2   2    b
#3   4    d
#4   5    e
#5   1    a

# Get the duplicates - each df row where th eid column is duplicated
df_duplicates  = df[df['id'].duplicated()]

for id in df_duplicates['id']:
    print(f"Student {id} is a duplicate")


#Student 2 is a duplicate
#Student 1 is a duplicate