我很好奇是否可以将几个函数应用于单个pandas dataframe列。例如,假设我有三个函数:
在:
def foo(col):
if 'hi' in col:
return 'TRUE'
def bar(col):
if 'bye' in col:
return 'TRUE'
def baz(col):
if 'ok' in col:
return 'TRUE'
以下数据框:
dfs = pd.DataFrame({'col':['The quick hi brown fox hi jumps over the lazy dog',
'The quick hi brown fox bye jumps over the lazy dog',
'The NO quick brown fox ok jumps bye over the lazy dog']})
如果我想将每个函数应用于col
,通常我会使用pandas apply函数:
dfs['new_col1'] = dfs['col'].apply(foo)
dfs['new_col2'] = dfs['col'].apply(bar)
dfs['new_col3'] = dfs['col'].apply(baz)
dfs
输出:
col new_col1 new_col2 new_col3
0 The quick hi brown fox hi jumps over the lazy dog TRUE None None
1 The quick hi brown fox bye jumps over the lazy... TRUE TRUE None
2 The NO quick brown fox ok jumps bye over the l... None TRUE TRUE
但是,正如您所看到的,我创建了3列。因此,我的问题是如何在大型数据帧中有效地将上述3个函数同时应用到特定列?,预期结果应为:
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE
请注意,我知道我可以在一个列中合并3列。不过,我想知道上述问题是否可行。
答案 0 :(得分:4)
为什么不将所有函数都归为一个巨型函数?
def oneGaintFunc(col):
def foo(col):
if 'hi' in col:
return 'TRUE'
def bar(col):
if 'bye' in col:
return 'TRUE'
def baz(col):
if 'ok' in col:
return 'TRUE'
a = foo(col)
b = bar(col)
c = baz(col)
return '{} {} {}'.format(a, b, c)
df['new_col'] = df['col'].apply(oneGiantFunc)
答案 1 :(得分:2)
您可以将apply
与list comprehension
一起使用过滤None
值:
dfs['new_col'] = dfs['col'].apply(lambda x: (', '.join([x for x in
[foo(x), bar(x), baz(x)] if x != None])))
print (dfs)
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE
答案 2 :(得分:1)
我认为你不能“同时”做到这一点。 但是,这里有2个选项
1。假设函数定义为:
dfs['new_col1'] = (dfs['col'].apply(foo)&dfs['col'].apply(bar))&dfs['col'].apply(baz)
2. 重新定义功能
def foo(aao): # all at once
if ('hi' in col) and ('bye' in col) and ('ok' in col):
return 'TRUE'
dfs['new_col'] = dfs['col'].apply(aao)
答案 3 :(得分:1)
使用lambda函数,例如
lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)])
在通话申请中。完整的例子:
In : dfs['new_col'] = dfs['col'].apply(lambda x: ', '.join([f(x) for f in [foo, bar, baz] if f(x)]))
In : dfs
Out:
col new_col
0 The quick hi brown fox hi jumps over the lazy dog TRUE
1 The quick hi brown fox bye jumps over the lazy... TRUE, TRUE
2 The NO quick brown fox ok jumps bye over the l... TRUE, TRUE