假设我想使用某个函数f(x)
将pandas Series映射到DataFrame中的多个列。
理想情况下,我会为每列使用一个函数。但是假设存在大量重叠的繁重计算,所以我希望所有计算一起完成(每行一个;行可以独立处理)。
是否有比以下更容易/更多的Pythonic(“pandastic”?)方式:
import pandas as pd
s = pd.Series('Joe Jill Stephen Mark Craig Alexander Emily Connor Cassidy'.split())
def f(x):
""" computations that should be done in tandem
(this is an easy example but the use case is for
expensive operations that return multiple outputs)"""
return (len(x), x[1:])
def map_series_to_columns(s, f, names):
""" returns a DataFrame to extract series """
s2 = s.map(f) # create an intermediate result first
return pd.DataFrame(
{name: s2.map(lambda x: x[k]).rename(name)
for k,name in enumerate(names)},
columns=names)
map_series_to_columns(s, f, ['len', 'slice'])
返回以下内容(这就是我想要的):
len slice
0 3 oe
1 4 ill
2 7 tephen
3 4 ark
4 5 raig
5 9 lexander
6 5 mily
7 6 onnor
8 7 assidy
一路上,创建了一个包含元组的系列(这不是我需要或想要的,但作为中间计算似乎是不可避免的):
0 (3, oe)
1 (4, ill)
2 (7, tephen)
3 (4, ark)
4 (5, raig)
5 (9, lexander)
6 (5, mily)
7 (6, onnor)
8 (7, assidy)
dtype: object
我应该补充一点:我并不担心pandas调用的计算费用;我希望CPU瓶颈在我的功能中,这是一个不可避免的瓶颈。
答案 0 :(得分:1)
s = pd.Series('Joe Jill Stephen Mark Craig Alexander Emily Connor Cassidy'.split())。to_frame('Name')
s [[''len','Update_name']] = s.apply(lambda s:(len(s ['Name']),s ['Name'] [1:]),axis = 1, result_type =“ expand”)
s
["Start Time"] = df["Start Time"].replace(":","", regex=True)