在熊猫中, 如何从多个其他列派生一列?
例如,假设我想用每个主题的正确地址形式注释我的数据集。 也许用标记一些图 - 所以我可以告诉结果是谁。
获取数据集:
data = [('male', 'Homer', 'Simpson'), ('female', 'Marge', 'Simpson'), ('male', 'Bart', 'Simpson'),('female', 'Lisa', 'Simpson'),('infant', 'Maggie', 'Simpson')]
people = pd.DataFrame(data, columns=["gender", "first_name", "last_name"])
所以我们有:
gender first_name last_name
0 male Homer Simpson
1 female Marge Simpson
2 male Bart Simpson
3 female Lisa Simpson
4 infant Maggie Simpson
一个函数,我想将其应用于每一行,将结果存储到一个新列中。
def get_address(gender, first, last):
title=""
if gender=='male':
title='Mr'
elif gender=='female':
title='Ms'
if title=='':
return first + ' '+ last
else:
return title + ' ' + first[0] + '. ' + last
目前我的方法是:
people['address'] = map(lambda row: get_address(*row),people.get_values())
gender first_name last_name address
0 male Homer Simpson Mr H. Simpson
1 female Marge Simpson Ms M. Simpson
2 male Bart Simpson Mr B. Simpson
3 female Lisa Simpson Ms L. Simpson
4 infant Maggie Simpson Maggie Simpson
哪个有效,但不优雅。 转换到未编制索引的列表,然后分配回索引列也感觉不好。
答案 0 :(得分:2)
您正在寻找的是apply(func,axis=1)
这将在您的数据框中逐行应用函数。
在您的示例中,将方法get_address修改为...
def get_address(row):#row is a pandas series with col names as indexes
title=""
gender = row['gender'] #extract gender from pandas series
first = row['first_name'] #extract firstname from pandas series
second = row['last_name'] #extract lastname from pandas series
if gender=='male':
title='Mr'
elif gender=='female':
title='Ms'
if title=='':
return first + ' '+ last
else:
return title + ' ' + first[0] + '. ' + last
然后调用people.apply(get_address,axis=1)
,它返回一个新列(实际上这是一个pandas系列,带有正确的索引,这是数据框如何正确地将其添加为列)以将其添加到数据帧添加这段代码......
people['address'] = people.apply(get_address,axis=1)
答案 1 :(得分:1)
您可以在没有任何显式循环的情况下执行此操作:
In [70]: df
Out[70]:
gender first_name last_name
0 male Homer Simpson
1 female Marge Simpson
2 male Bart Simpson
3 female Lisa Simpson
4 infant Maggie Simpson
In [71]: title = df.gender.replace({'male': 'Mr', 'female': 'Ms', 'infant': ''})
In [72]: initial = np.where(df.gender != 'infant', df.first_name.str[0] + '. ', df.first_name + ' ')
In [73]: initial
Out[73]: array(['H. ', 'M. ', 'B. ', 'L. ', 'Maggie '], dtype=object)
In [74]: address = (title + ' ' + Series(initial) + df.last_name).str.strip()
In [75]: address
Out[75]:
0 Mr H. Simpson
1 Ms M. Simpson
2 Mr B. Simpson
3 Ms L. Simpson
4 Maggie Simpson
dtype: object
结帐the documentation for Series.str
methods,他们相当漂亮。 str
中的大多数方法都是在extract
等商品之外实现的。