在pandas中使用2列应用函数

时间:2017-04-18 03:10:25

标签: python-3.x pandas

请考虑以下“exampleDF”。

view = inflater.inflate(R.layout.fragment_location, container, false);

我想使用年龄和性别创建一个新列,所以如果年龄< 15 name age sex a 21 male b 13 female c 56 female d 12 male e 45 nan f 10 female newColumn,否则它等于性。

我试过这个

child

但我收到错误exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda age,sex: 'child' if age < 15 else sex)

请帮我解决我的错误。

2 个答案:

答案 0 :(得分:3)

我认为更好的是使用mask - 如果True中的boolean masksex列获取值,则将child字符串转换为新列:

print (exampleDF['age'] < 15)
0    False
1     True
2    False
3     True
4    False
5     True
Name: age, dtype: bool


exampleDF['newColumn'] = exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
print (exampleDF)
  name  age     sex newColumn
0    a   21    male      male
1    b   13  female     child
2    c   56  female    female
3    d   12    male     child
4    e   45     NaN       NaN
5    f   10  female     child

解决方案的主要优点是速度更快:

#small 6 rows df
In [63]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 517 µs per loop

In [64]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1000 loops, best of 3: 867 µs per loop
#bigger 6k df
exampleDF = pd.concat([exampleDF]*1000).reset_index(drop=True)

In [66]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
The slowest run took 5.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 589 µs per loop

In [67]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
10 loops, best of 3: 104 ms per loop
#bigger 60k df - apply very slow
exampleDF = pd.concat([exampleDF]*10000).reset_index(drop=True)

In [69]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 1.23 ms per loop

In [70]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1 loop, best of 3: 1.03 s per loop

答案 1 :(得分:0)

这将完成这项工作:

import pandas as pd
exampleDF=pd.DataFrame({'name':['a','b','c','d','e','f'],'age':[21,13,56,12,45,10],'sex':['male','female','female','male',None,'male']})
exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)

然后exampleDF是:

    age name    sex     newColumn
0   21  a       male    male
1   13  b       female  child
2   56  c       female  female
3   12  d       male    child
4   45  e       None    None
5   10  f       male    child

在您的代码中,您尝试定义lambda age,sex:,但不能这样做,因为exampleDF[['age','sex']]是一个包含两列的数据框(而不是两个单独的列)。上述解决方案可以解决此问题。此外,您还需要指定轴。