使用pandas Series.map为DataFrame生成多个列

时间:2018-04-18 18:22:14

标签: python pandas dataframe

假设我想使用某个函数f(x)将pandas Series映射到DataFrame中的多个列。

理想情况下,我会为每列使用一个函数。但是假设存在大量重叠的繁重计算,所以我希望所有计算一起完成(每行一个;行可以独立处理)。

是否有比以下更容易/更多的Pythonic(“pandastic”?)方式:

import pandas as pd

s = pd.Series('Joe Jill Stephen Mark Craig Alexander Emily Connor Cassidy'.split())

def f(x):
    """ computations that should be done in tandem
    (this is an easy example but the use case is for
    expensive operations that return multiple outputs)"""
    return (len(x), x[1:]) 

def map_series_to_columns(s, f, names):
    """ returns a DataFrame to extract series """
    s2 = s.map(f)   # create an intermediate result first
    return pd.DataFrame(
        {name: s2.map(lambda x: x[k]).rename(name) 
         for k,name in enumerate(names)},
        columns=names)

map_series_to_columns(s, f, ['len', 'slice'])

返回以下内容(这就是我想要的):

   len     slice
0    3        oe
1    4       ill
2    7    tephen
3    4       ark
4    5      raig
5    9  lexander
6    5      mily
7    6     onnor
8    7    assidy

一路上,创建了一个包含元组的系列(这不是我需要或想要的,但作为中间计算似乎是不可避免的):

0          (3, oe)
1         (4, ill)
2      (7, tephen)
3         (4, ark)
4        (5, raig)
5    (9, lexander)
6        (5, mily)
7       (6, onnor)
8      (7, assidy)
dtype: object

我应该补充一点:我并不担心pandas调用的计算费用;我希望CPU瓶颈在我的功能中,这是一个不可避免的瓶颈。

1 个答案:

答案 0 :(得分:1)

s = pd.Series('Joe Jill Stephen Mark Craig Alexander Emily Connor Cassidy'.split())。to_frame('Name')

s [[''len','Update_name']] = s.apply(lambda s:(len(s ['Name']),s ['Name'] [1:]),axis = 1, result_type =“ expand”)

s

["Start Time"] = df["Start Time"].replace(":","", regex=True)