Question

请考虑以下“exampleDF”。

view = inflater.inflate(R.layout.fragment_location, container, false);

我想使用年龄和性别创建一个新列，所以如果年龄＆lt; 15 name age sex a 21 male b 13 female c 56 female d 12 male e 45 nan f 10 female是newColumn，否则它等于性。

我试过这个

child

但我收到错误exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda age,sex: 'child' if age < 15 else sex)

请帮我解决我的错误。

Answer 1

我认为更好的是使用mask - 如果True中的boolean mask从sex列获取值，则将child字符串转换为新列：

print (exampleDF['age'] < 15)
0    False
1     True
2    False
3     True
4    False
5     True
Name: age, dtype: bool


exampleDF['newColumn'] = exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
print (exampleDF)
  name  age     sex newColumn
0    a   21    male      male
1    b   13  female     child
2    c   56  female    female
3    d   12    male     child
4    e   45     NaN       NaN
5    f   10  female     child

解决方案的主要优点是速度更快：

#small 6 rows df
In [63]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 517 µs per loop

In [64]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1000 loops, best of 3: 867 µs per loop

#bigger 6k df
exampleDF = pd.concat([exampleDF]*1000).reset_index(drop=True)

In [66]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
The slowest run took 5.41 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 589 µs per loop

In [67]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
10 loops, best of 3: 104 ms per loop

#bigger 60k df - apply very slow
exampleDF = pd.concat([exampleDF]*10000).reset_index(drop=True)

In [69]: %timeit exampleDF['sex'].mask(exampleDF['age'] < 15, 'child')
1000 loops, best of 3: 1.23 ms per loop

In [70]: %timeit exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)
1 loop, best of 3: 1.03 s per loop

Answer 2

这将完成这项工作：

import pandas as pd
exampleDF=pd.DataFrame({'name':['a','b','c','d','e','f'],'age':[21,13,56,12,45,10],'sex':['male','female','female','male',None,'male']})
exampleDF['newColumn'] = exampleDF[['age','sex']].apply(lambda x: 'child' if x['age'] < 15 else x['sex'],axis=1)

然后exampleDF是：

    age name    sex     newColumn
0   21  a       male    male
1   13  b       female  child
2   56  c       female  female
3   12  d       male    child
4   45  e       None    None
5   10  f       male    child

在您的代码中，您尝试定义lambda age,sex:，但不能这样做，因为exampleDF[['age','sex']]是一个包含两列的数据框（而不是两个单独的列）。上述解决方案可以解决此问题。此外，您还需要指定轴。

在pandas中使用2列应用函数

2 个答案: