Python:如果满足条件,则在数据框中填充一列

时间:2021-05-16 13:19:29

标签: pandas dataframe

让我们从计算每个学生的出席率开始。请执行下列操作: 创建一个名为attendence_score 的新列。 使用以下条件填写该列:

No Absence = 5
1-5 Absences = 4
6-10 Absences = 3
11-15 Absences = 2
16-20 Absences = 1
21 or more Absences = 0

在数据集中有一列名为 absenses

我的想法是使用 if 条件来做到这一点。

但是我在这里搜索了很多代码,大多数代码都是填充NaN数据。如何解决我的问题?

3 个答案:

答案 0 :(得分:1)

手动方式:

s = df['absences']
df.loc[s == 0, 'absence_score'] = 5
df.loc[s.between(1, 5), 'absence_score'] = 4
df.loc[s.between(6, 10), 'absence_score'] = 3
df.loc[s.between(11, 15), 'absence_score'] = 2
df.loc[s.between(16, 20), 'absence_score'] = 1
df.loc[s > 21, 'absence_score'] = 0

使用类别:

df['absence_score'] = pd.cut(df['absences'], [-np.inf, 0, 5, 10, 15, 20, np.inf], labels=range(5,-1,-1))

或者您可以利用跨级别的统一步骤并使用数学公式:

df['absence_score'] = 5 - np.ceil(df['absences'].div(5).clip(upper=5)).astype('int')

答案 1 :(得分:0)

conditions = [
    (df['likes_count'] <= 2),
    (df['likes_count'] > 2) & (df['likes_count'] <= 9),
    (df['likes_count'] > 9) & (df['likes_count'] <= 15),
    (df['likes_count'] > 15)
    ]

# create a list of the values we want to assign for each condition
values = ['tier_4', 'tier_3', 'tier_2', 'tier_1']

# create a new column and use np.select to assign values to it using our lists as arguments
df['tier'] = np.select(conditions, values)

# display updated DataFrame
df.head()

或者像这样?

答案 2 :(得分:0)

df = student
print(df)

#df['attendence_score'] = np.where((df['absences'] =0 ) ,5, df['attendence_score'])
#df.loc[df['absences'] = 0, 'attendence_score'] = 5

attendence_score = [
    (df['absences'] == 0),
    (df['absences'] > 0) & (df['absences'] <= 5),
    (df['absences'] > 5) & (df['absences'] <= 10),
    (df['absences'] > 10) & (df['absences'] <= 15),
    (df['absences'] > 15) & (df['absences'] <= 20),
    (df['absences'] > 21)
    ]

# create a list of the values we want to assign for each condition
values = ['5', '4', '3', '2','1','0']

# create a new column and use np.select to assign values to it using our lists as arguments
df['attendence_score'] = np.select(attendence_score, values)

# display updated DataFrame
df.head()

我自己完成的。我爱我自己!!!