请考虑以下“exampleDF”。
view = inflater.inflate(R.layout.fragment_location, container, false);
我想使用年龄和性别创建一个新列,所以如果年龄< 15 name age sex
a 21 male
b 13 female
c 56 female
d 12 male
e 45 nan
f 10 female
是newColumn
,否则它等于性。
我试过这个
child
但我收到错误exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda age,sex: 'child' if age < 15 else sex)
请帮我解决我的错误。
答案 0 :(得分:3)
我认为更好的是使用mask
- 如果True
中的boolean mask
从sex
列获取值,则将child
字符串转换为新列:
print (exampleDF['age'] < 15)
0 False
1 True
2 False
3 True
4 False
5 True
Name: age, dtype: bool
exampleDF['newColumn'] = exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
print (exampleDF)
name age sex newColumn
0 a 21 male male
1 b 13 female child
2 c 56 female female
3 d 12 male child
4 e 45 NaN NaN
5 f 10 female child
解决方案的主要优点是速度更快:
#small 6 rows df
In [63]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 517 µs per loop
In [64]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1000 loops, best of 3: 867 µs per loop
#bigger 6k df
exampleDF = pd.concat([exampleDF]*1000).reset_index(drop=True)
In [66]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
The slowest run took 5.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 589 µs per loop
In [67]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
10 loops, best of 3: 104 ms per loop
#bigger 60k df - apply very slow
exampleDF = pd.concat([exampleDF]*10000).reset_index(drop=True)
In [69]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 1.23 ms per loop
In [70]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1 loop, best of 3: 1.03 s per loop
答案 1 :(得分:0)
这将完成这项工作:
import pandas as pd
exampleDF=pd.DataFrame({'name':['a','b','c','d','e','f'],'age':[21,13,56,12,45,10],'sex':['male','female','female','male',None,'male']})
exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
然后exampleDF
是:
age name sex newColumn
0 21 a male male
1 13 b female child
2 56 c female female
3 12 d male child
4 45 e None None
5 10 f male child
在您的代码中,您尝试定义lambda age,sex:
,但不能这样做,因为exampleDF[['age','sex']]
是一个包含两列的数据框(而不是两个单独的列)。上述解决方案可以解决此问题。此外,您还需要指定轴。