def age_range(age):
if age <= 18:
return 'Minors'
elif age >= 19 & age < 63:
return 'Adults'
elif age >= 63 & age < 101:
return 'Senior Citizen'
else:
return 'Age Unknown'
titanic_data_df["PassengerType"] = titanic_data_df[['Age']].apply(age_range, axis = 1)
titanic_data_df.head()
当我尝试向现有数据框(titanic_data_df)添加新列时出现以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-466-741f5646101e> in <module>()
1 #create a new df with just age and distinguish each passenger as minor, adult or senior citizen
----> 2 titanic_data_df["PassengerType"] = titanic_data_df[['Age']].apply(age_range, axis = 1)
3
4 titanic_data_df.head()
C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\frame.pyc in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
4161 if reduce is None:
4162 reduce = True
-> 4163 return self._apply_standard(f, axis, reduce=reduce)
4164 else:
4165 return self._apply_broadcast(f, axis)
C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\frame.pyc in _apply_standard(self, func, axis, ignore_failures, reduce)
4257 try:
4258 for i, v in enumerate(series_gen):
-> 4259 results[i] = func(v)
4260 keys.append(v.name)
4261 except Exception as e:
<ipython-input-465-e62ccbeee80e> in age_range(age)
1 def age_range(age):
----> 2 if age <= 18:
3 return 'Minors'
4 elif age >= 19 & age < 63:
5 return 'Adults'
C:\Users\test\Anaconda2\envs\py27DAND\lib\site-packages\pandas\core\generic.pyc in __nonzero__(self)
915 raise ValueError("The truth value of a {0} is ambiguous. "
916 "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
--> 917 .format(self.__class__.__name__))
918
919 __bool__ = __nonzero__
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', u'occurred at index 0')
从我到目前为止所读到的内容,它与我上面方法中的if ... else语句有关。我无法弄清楚它是什么。任何帮助表示赞赏。谢谢。
答案 0 :(得分:1)
当您选择列titanic_data_df[['Age']]
时(请注意双方括号),您实际上正在获取包含单个列的DataFrame。在这种情况下,apply()
函数将单个元素Series传递给函数age_range
。
请改为尝试:
titanic_data_df["PassengerType"] = titanic_data_df['Age'].apply(age_range)
答案 1 :(得分:0)
Pandas cut功能可以让您更轻松。首先,我将构建一个数据框来演示cut
函数。
titanic_data_df = pd.DataFrame(data=[[13, 'Male'], [14, 'Female'], [38, 'Female'], [72, 'Male'], [33, 'Female'], [80, 'Male'], [34, 'Male'], [15, 'Female'], [27, 'Female'],[23, 'Male'], [64, 'Female'], [38, 'Female'], [12, 'Male'], [32, 'Female'], [21, 'Male'], [66, 'Male'], [73, 'Female'], [22, 'Female']], columns=['Age', 'Sex'])
print(titanic_data_df)
Age Sex
0 13 Male
1 14 Female
2 38 Female
3 72 Male
4 33 Female
5 80 Male
6 34 Male
7 15 Female
8 27 Female
9 23 Male
10 64 Female
11 38 Female
12 12 Male
13 32 Female
14 21 Male
15 66 Male
16 73 Female
17 22 Female
然后,我只需应用cut
函数:
bins = ['Minors', 'Adults', 'Senior Citizens']
titanic_data_df["PassengerType"] = pd.cut(titanic_data_df.Age, [0, 18, 63, 101], labels=bins)
print(titanic_data_df)
Age Sex PassengerType
0 13 Male Minors
1 14 Female Minors
2 38 Female Adults
3 72 Male Senior Citizen
4 33 Female Adults
5 80 Male Senior Citizen
6 34 Male Adults
7 15 Female Minors
8 27 Female Adults
9 23 Male Adults
10 64 Female Senior Citizen
11 38 Female Adults
12 12 Male Minors
13 32 Female Adults
14 21 Male Adults
15 66 Male Senior Citizen
16 73 Female Senior Citizen
17 22 Female Adults