我有一个以年龄为列的人的数据框。我想将这个年龄组匹配,例如,婴儿= 0-2岁,儿童= 3-12岁,年轻人= 13-18岁,年轻人= 19-30岁,成人= 31-50岁,老年人= 51-65岁。
我创建了定义这些年份组的列表,例如Adult=list(range(31,51))
等。
如何通过创建新列将列表“成人”的名称与数据框匹配?
小输入:数据框由三列组成:df ['Name'],df ['Country'],df ['Age']。
Name Country Age
Anthony France 15
Albert Belgium 54
.
.
.
Zahra Tunisia 14
所以我需要将age列与我已经拥有的列表进行匹配。输出应如下所示:
Name Country Age Group
Anthony France 15 Young
Albert Belgium 54 Adult
.
.
.
Zahra Tunisia 14 Young
谢谢!
答案 0 :(得分:1)
这是使用pd.cut
进行此操作的一种方法:
df = pd.DataFrame({"person_id": range(25), "age": np.random.randint(0, 100, 25)})
print(df.head(10))
==>
person_id age
0 0 30
1 1 42
2 2 78
3 3 2
4 4 44
5 5 43
6 6 92
7 7 3
8 8 13
9 9 76
df["group"] = pd.cut(df.age, [0, 18, 50, 100], labels=["child", "adult", "senior"])
print(df.head(10))
==>
person_id age group
0 0 30 adult
1 1 42 adult
2 2 78 senior
3 3 2 child
4 4 44 adult
5 5 43 adult
6 6 92 senior
7 7 3 child
8 8 13 child
9 9 76 senior
根据您的问题,如果您有几个列表(如下面的列表),并且想要进行转换以将其用于“合并”,则可以执行以下操作:
# for example, these are the lists
Adult = list(range(18,50))
Child = list(range(0, 18))
Senior = list(range(50, 100))
# Creating bins out of the lists.
bins = [min(l) for l in [Child, Adult, Senior]]
bins.append(max([max(l) for l in [Child, Adult, Senior]]))
labels = ["Child", "Adult", "Senior"]
# using the bins:
df["group"] = pd.cut(df.age, bins, labels=labels)
答案 1 :(得分:1)
IIUC我会选择np.select
:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Age': [3, 20, 40]})
condlist = [df.Age.between(0,2),
df.Age.between(3,12),
df.Age.between(13,18),
df.Age.between(19,30),
df.Age.between(31,50),
df.Age.between(51,65)]
choicelist = ['Baby', 'Child', 'Young',
'Young Adult', 'Adult', 'Senior Adult']
df['Adult'] = np.select(condlist, choicelist)
输出:
Age Adult
0 3 Child
1 20 Young Adult
2 40 Adult
答案 2 :(得分:1)
为使初学者更清楚,您可以定义一个功能,该功能将相应地返回每个人的年龄段,然后使用pandas.apply()
来应用该功能到我们的'Group'
列:
import pandas as pd
def age(row):
a = row['Age']
if 0 < a <= 2:
return 'Baby'
elif 2 < a <= 12:
return 'Child'
elif 12 < a <= 18:
return 'Young'
elif 18 < a <= 30:
return 'Young Adult'
elif 30 < a <= 50:
return 'Adult'
elif 50 < a <= 65:
return 'Senior Adult'
df = pd.DataFrame({'Name':['Anthony','Albert','Zahra'],
'Country':['France','Belgium','Tunisia'],
'Age':[15,54,14]})
df['Group'] = df.apply(age, axis=1)
print(df)
输出:
Name Country Age Group
0 Anthony France 15 Young
1 Albert Belgium 54 Senior Adult
2 Zahra Tunisia 14 Young