我有一个数据集如下:
for name in df:
if points > 2:
grade = 'pass'
else:
grade = 'fail'
average_points = points/attempts
attempts_left = 10 - attempts
假设我有一些形式的代码
name grade average_points attempts_left
'Alex' fail 0.5 6
'Brian' fail 0.5 8
'Cathy' pass 0.6 5
'Daniel' pass 0.71 3
我想在这里实现的是一个表格的输出表(在pandas数据帧中)
{{1}}
问题是,我不确定我应该在代码中使用的返回/追加函数。此外,我知道在原始数据集中添加“grade”,“average_points”和“attempts_left”列可能更简单,但这种方法在我的情况下不起作用,因为我的原始数据比工作更复杂上面的例子。
任何帮助将不胜感激。谢谢!
答案 0 :(得分:4)
您可以对您的操作进行矢量化并使用assign
In [839]: df.assign(attempts_left=10 - df.attempts,
...: average_points=df.points / df.attempts,
...: grade=np.where(df.points > 2, 'pass', 'fail'))
Out[839]:
name points attempts attempts_left average_points grade
0 'Alex' 2 4 6 0.500000 fail
1 'Brian' 1 2 8 0.500000 fail
2 'Cathy' 3 5 5 0.600000 pass
3 'Daniel' 5 7 3 0.714286 pass
答案 1 :(得分:1)
使用pandas.DataFrame()
和df.append
:
df2 = pandas.DataFrame()
for i,row in df.iterrows():
points = row["points"]
attempts = row["attempts"]
new_row = {}
new_row["name"] = row["name"]
if points > 2:
new_row["grade"] = 'pass'
else:
new_row["grade"] = 'fail'
new_row["average_points"] = points/attempts
new_row["attempts_left"] = 10 - attempts
df2 = df2.append(pandas.DataFrame(new_row,index=[i]))
print(df2)
输出:
attempts_left average_points grade name
0 6 0.500000 fail Alex
1 8 0.500000 fail Brian
2 5 0.600000 pass Cathy
3 3 0.714286 pass Daniel
答案 2 :(得分:0)
使用apply
:
import pandas as pd
df = pd.DataFrame([
['Alex', 2, 4],
['Brian', 1, 2],
['Cathy', 3, 5],
['Daniel', 5, 7],
], columns=['name', 'points', 'attempts'])
df['grade'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['attempts_left'] = df['points'].apply(lambda points: 'pass' if points > 2 else 'fail')
df['average_points'] = df[['points', 'attempts']].apply(lambda row: row['points']/row['attempts'], axis=1)
new_df = df[['name', 'grade', 'average_points', 'attempts_left']]