Question

我有一个数据集如下：

for name in df:
    if points > 2:
        grade = 'pass'
    else:
        grade = 'fail'

    average_points = points/attempts
    attempts_left = 10 - attempts

假设我有一些形式的代码

name        grade    average_points    attempts_left
'Alex'      fail          0.5               6
'Brian'     fail          0.5               8
'Cathy'     pass          0.6               5
'Daniel'    pass          0.71              3

我想在这里实现的是一个表格的输出表（在pandas数据帧中）

{{1}}

问题是，我不确定我应该在代码中使用的返回/追加函数。此外，我知道在原始数据集中添加“grade”，“average_points”和“attempts_left”列可能更简单，但这种方法在我的情况下不起作用，因为我的原始数据比工作更复杂上面的例子。

任何帮助将不胜感激。谢谢！

Answer 1

您可以对您的操作进行矢量化并使用assign

In [839]: df.assign(attempts_left=10 - df.attempts,
     ...:           average_points=df.points / df.attempts,
     ...:           grade=np.where(df.points > 2, 'pass', 'fail'))
Out[839]:
       name  points  attempts  attempts_left  average_points grade
0    'Alex'       2         4              6        0.500000  fail
1   'Brian'       1         2              8        0.500000  fail
2   'Cathy'       3         5              5        0.600000  pass
3  'Daniel'       5         7              3        0.714286  pass

Answer 2

使用pandas.DataFrame()和df.append：

df2 = pandas.DataFrame()
for i,row in df.iterrows():
    points = row["points"]
    attempts = row["attempts"]
    new_row = {}
    new_row["name"] = row["name"]
    if points > 2:
        new_row["grade"] = 'pass'
    else:
        new_row["grade"] = 'fail'

    new_row["average_points"] = points/attempts
    new_row["attempts_left"] = 10 - attempts
    df2 = df2.append(pandas.DataFrame(new_row,index=[i]))
print(df2)

输出：

   attempts_left  average_points grade    name
0              6        0.500000  fail    Alex
1              8        0.500000  fail   Brian
2              5        0.600000  pass   Cathy
3              3        0.714286  pass  Daniel

Answer 3

使用apply：

import pandas as pd

df = pd.DataFrame([
    ['Alex', 2, 4],
    ['Brian', 1, 2],
    ['Cathy', 3, 5],
    ['Daniel', 5, 7],
], columns=['name', 'points', 'attempts'])

df['grade'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['attempts_left'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['average_points'] = df[['points', 'attempts']].apply(lambda row: row['points']/row['attempts'], axis=1)

new_df = df[['name', 'grade', 'average_points', 'attempts_left']]

将结果作为表输出到pandas中

3 个答案: