给熊猫DataFrame
多列
pd.DataFrame({'name': ['Bob', 'Alice'], 'age': [20, 40], 'height': [2.0, 2.1]})
name age height
0 Bob 20 2.0
1 Alice 40 2.1
一个带有多个参数的函数
def example_hash(name: str, age: int) -> str:
return "In 10 years {} will be {}".format(name, age+10)
如何用附加列更新DataFrame,该列包含将函数应用于其他列的子集的结果?
结果DataFrame是将example_hash
和name
列应用于age
的结果:
name age height hash
0 Bob 20 2.0 In 10 years Bob would be 30
1 Alice 40 2.1 In 10 years Alice will be 50
我对以pandas
为中心的响应很感兴趣。我了解可以构造一个Python list
,遍历各行,并将其追加到列表中最终将成为专栏。
预先感谢您的考虑和答复。
答案 0 :(得分:2)
您可以使用apply
函数遍历行并添加新列。
In [139]: df = pd.DataFrame({'name': ['Bob', 'Alice'], 'age': [20, 40], 'height': [2.0, 2.1]})
In [140]: df
Out[140]:
name age height
0 Bob 20 2.0
1 Alice 40 2.1
In [142]: def example_hash(row):
...: row['hash']= "In 10 years {} will be {}".format(row['name'], row['age']+10)
...: return row
...:
In [143]: df = df.apply(example_hash,axis=1)
In [144]: df
Out[144]:
name age height hash
0 Bob 20 2.0 In 10 years Bob will be 30
1 Alice 40 2.1 In 10 years Alice will be 50
答案 1 :(得分:2)
您可以执行此操作而无需更改example_hash()
方法:
np.vectorize
In [2204]: import numpy as np
In [2200]: def example_hash(name: str, age: int) -> str:
...: return "In 10 years {} will be {}".format(name, age+10)
...:
In [2202]: df['new'] = np.vectorize(example_hash)(df['name'], df['age'])
In [2203]: df
Out[2203]:
name age height new
0 Bob 20 2.0 In 10 years Bob will be 30
1 Alice 40 2.1 In 10 years Alice will be 50
df.apply
与lambda
一起使用,而无需更改自定义方法:In [2207]: df['new'] = df.apply(lambda x: example_hash(x['name'], x['age']), axis=1)
In [2208]: df
Out[2208]:
name age height new
0 Bob 20 2.0 In 10 years Bob will be 30
1 Alice 40 2.1 In 10 years Alice will be 50