在Python Pandas中派生新列时出现ValueError

时间:2019-09-21 15:13:56

标签: python

我有一张表,其中列考试分数:

Exam Score  
120
700
1000

如果考试分数大于850,则需要给A成绩;如果他的分数介于801-850之间,则需要给他B等。

我的任务:需要创建一个列分数来告诉学生的分数

Grade           Exam score                         

A                           >850                            
B                            801-850                        
C                            751-800                        
D                            701-750                        
E                            651-700                        
F                            601-650                        
G                            550-600                        

代码如下:

workbook = pd.read_csv("Examscores_raw.csv")

def letter(row):
        if workbook['Exam score']>850:
            return 'A'
        elif (workbook['Exam score']>801):
            return 'B'
        elif (workbook['Exam score']>751):
            return 'C'
        elif (workbook['Exam score']>701):
            return 'D'
        elif (workbook['Exam score']>651):
            return 'E'
        elif (workbook['Exam score']>601):
            return 'F'
        elif (workbook['Exam score']>550):
            return 'G'
        else:
            return 'Fail'
 workbook['Grade']=workbook.apply (lambda row: letter(row), axis=1)

错误,我得到了:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or 
a.all().', 'occurred at index 0')

我经历了许多在线答案,但无法解决此问题。请帮助。

2 个答案:

答案 0 :(得分:2)

您可以使代码更容易一些:

# Declare conditions, and corresponding categories
conditions, type_choices = (
    [
        (df["exam_score"] >= 35),
        (df["exam_score"] < 35),
    ],
        ["Pass", "Fail"]
)

# Tag Pass/Fail based on the above conditions
df["result"] = np.select(conditions, type_choices, default="Fail")

您的用例:

import pandas as pd
import numpy as np
workbook = pd.read_csv("Examscores_raw.csv")

# Declare conditions, and corresponding categories
conditions, type_choices = (
    [
        (workbook["Exam score"] > 850),
        (workbook["Exam score"] > 801),
        (workbook["Exam score"] > 751),
        (workbook["Exam score"] > 701),
        (workbook["Exam score"] > 651),
        (workbook["Exam score"] > 601),
        (workbook["Exam score"] > 550),
        (workbook["Exam score"] <= 550),
    ],
        ["A", "B", "C", "D", "E", "F", "G", "Fail"]
)

# Tag Pass/Fail based on the above conditions
workbook["Grade"] = np.select(conditions, type_choices, default="Fail")

答案 1 :(得分:2)

@pissall的答案将起作用,而您提供的代码中的实际错误位于def(letter)函数中,您需要将所有工作簿更改为行。同样,不需要lambda,if中的括号也不是必需的。因此,您将拥有:

workbook = pd.read_csv("Examscores_raw.csv")

def letter(row):
        if row['Exam score']>850:
            return 'A'
        elif row['Exam score']>801:
            return 'B'
        elif row['Exam score']>751:
            return 'C'
        elif row['Exam score']>701:
            return 'D'
        elif row['Exam score']>651:
            return 'E'
        elif row['Exam score']>601:
            return 'F'
        elif row['Exam score']>550:
            return 'G'
        else:
            return 'Fail'
 workbook['Grade']=workbook.apply (letter, axis=1)