让我们从计算每个学生的出席率开始。请执行下列操作: 创建一个名为attendence_score 的新列。 使用以下条件填写该列:
No Absence = 5
1-5 Absences = 4
6-10 Absences = 3
11-15 Absences = 2
16-20 Absences = 1
21 or more Absences = 0
在数据集中有一列名为 absenses
。
我的想法是使用 if 条件来做到这一点。
但是我在这里搜索了很多代码,大多数代码都是填充NaN数据。如何解决我的问题?
答案 0 :(得分:1)
手动方式:
s = df['absences']
df.loc[s == 0, 'absence_score'] = 5
df.loc[s.between(1, 5), 'absence_score'] = 4
df.loc[s.between(6, 10), 'absence_score'] = 3
df.loc[s.between(11, 15), 'absence_score'] = 2
df.loc[s.between(16, 20), 'absence_score'] = 1
df.loc[s > 21, 'absence_score'] = 0
使用类别:
df['absence_score'] = pd.cut(df['absences'], [-np.inf, 0, 5, 10, 15, 20, np.inf], labels=range(5,-1,-1))
或者您可以利用跨级别的统一步骤并使用数学公式:
df['absence_score'] = 5 - np.ceil(df['absences'].div(5).clip(upper=5)).astype('int')
答案 1 :(得分:0)
conditions = [
(df['likes_count'] <= 2),
(df['likes_count'] > 2) & (df['likes_count'] <= 9),
(df['likes_count'] > 9) & (df['likes_count'] <= 15),
(df['likes_count'] > 15)
]
# create a list of the values we want to assign for each condition
values = ['tier_4', 'tier_3', 'tier_2', 'tier_1']
# create a new column and use np.select to assign values to it using our lists as arguments
df['tier'] = np.select(conditions, values)
# display updated DataFrame
df.head()
或者像这样?
答案 2 :(得分:0)
df = student
print(df)
#df['attendence_score'] = np.where((df['absences'] =0 ) ,5, df['attendence_score'])
#df.loc[df['absences'] = 0, 'attendence_score'] = 5
attendence_score = [
(df['absences'] == 0),
(df['absences'] > 0) & (df['absences'] <= 5),
(df['absences'] > 5) & (df['absences'] <= 10),
(df['absences'] > 10) & (df['absences'] <= 15),
(df['absences'] > 15) & (df['absences'] <= 20),
(df['absences'] > 21)
]
# create a list of the values we want to assign for each condition
values = ['5', '4', '3', '2','1','0']
# create a new column and use np.select to assign values to it using our lists as arguments
df['attendence_score'] = np.select(attendence_score, values)
# display updated DataFrame
df.head()
我自己完成的。我爱我自己!!!