是否有数据框的多列映射函数?

时间:2014-08-01 05:44:53

标签: python pandas

在熊猫中, 如何从多个其他列派生一列?

例如,假设我想用每个主题的正确地址形式注释我的数据集。 也许用标记一些图 - 所以我可以告诉结果是谁。

获取数据集:

data = [('male', 'Homer', 'Simpson'), ('female', 'Marge', 'Simpson'), ('male', 'Bart', 'Simpson'),('female', 'Lisa', 'Simpson'),('infant', 'Maggie', 'Simpson')]
people = pd.DataFrame(data, columns=["gender", "first_name", "last_name"])

所以我们有:

   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

一个函数,我想将其应用于每一行,将结果存储到一个新列中。

def get_address(gender, first, last):
    title=""
    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

目前我的方法是:

people['address'] = map(lambda row: get_address(*row),people.get_values())



   gender first_name last_name         address
0    male      Homer   Simpson   Mr H. Simpson
1  female      Marge   Simpson   Ms M. Simpson
2    male       Bart   Simpson   Mr B. Simpson
3  female       Lisa   Simpson   Ms L. Simpson
4  infant     Maggie   Simpson  Maggie Simpson

哪个有效,但不优雅。 转换到未编制索引的列表,然后分配回索引列也感觉不好。

2 个答案:

答案 0 :(得分:2)

您正在寻找的是apply(func,axis=1)这将在您的数据框中逐行应用函数。

在您的示例中,将方法get_address修改为...

def get_address(row):#row is a pandas series with col names as indexes
    title=""
    gender = row['gender']     #extract gender from pandas series
    first = row['first_name']  #extract firstname from pandas series
    second = row['last_name']  #extract lastname from pandas series

    if gender=='male':
        title='Mr'
    elif gender=='female':
        title='Ms'

    if title=='':
        return first + ' '+ last
    else:
        return title + ' ' + first[0] + '. ' + last

然后调用people.apply(get_address,axis=1),它返回一个新列(实际上这是一个pandas系列,带有正确的索引,这是数据框如何正确地将其添加为列)以将其添加到数据帧添加这段代码......

people['address'] = people.apply(get_address,axis=1)

答案 1 :(得分:1)

您可以在没有任何显式循环的情况下执行此操作:

In [70]: df
Out[70]:
   gender first_name last_name
0    male      Homer   Simpson
1  female      Marge   Simpson
2    male       Bart   Simpson
3  female       Lisa   Simpson
4  infant     Maggie   Simpson

In [71]: title = df.gender.replace({'male': 'Mr', 'female': 'Ms', 'infant': ''})

In [72]: initial = np.where(df.gender != 'infant', df.first_name.str[0] + '. ', df.first_name + ' ')
In [73]: initial
Out[73]: array(['H. ', 'M. ', 'B. ', 'L. ', 'Maggie '], dtype=object)

In [74]: address = (title + ' ' + Series(initial) + df.last_name).str.strip()

In [75]: address
Out[75]:
0     Mr H. Simpson
1     Ms M. Simpson
2     Mr B. Simpson
3     Ms L. Simpson
4    Maggie Simpson
dtype: object

结帐the documentation for Series.str methods,他们相当漂亮。 str中的大多数方法都是在extract等商品之外实现的。