我的数据框df
有几列,如下所示:
col1 col2
0 0.627521 0.026832
1 0.470450 0.319736
2 0.015760 0.484664
3 0.645810 0.733688
4 0.850554 0.506945
我想对每个列应用一个函数,并将结果添加为附加列(类似于this question),其中名称是原始名称加上所有添加列的公共后缀。
我尝试了以下(高度简化的案例):
import pandas as pd
import numpy as np
def do_and_rename(s, s2):
news = s + s2
news.name = s.name + "_change"
return news
df = pd.DataFrame({'col1': np.random.rand(5), 'col2': np.random.rand(5)})
new_df = pd.concat([df, df.apply(lambda x: do_and_rename(x, df.index))], axis=1)
给了我
col1 col2 col1 col2
0 0.627521 0.026832 0.627521 0.026832
1 0.470450 0.319736 1.470450 1.319736
2 0.015760 0.484664 2.015760 2.484664
3 0.645810 0.733688 3.645810 3.733688
4 0.850554 0.506945 4.850554 4.506945
计算正确,但列名错误。
我想要的输出是
col1 col2 col1_change col2_change
0 0.627521 0.026832 0.627521 0.026832
1 0.470450 0.319736 1.470450 1.319736
2 0.015760 0.484664 2.015760 2.484664
3 0.645810 0.733688 3.645810 3.733688
4 0.850554 0.506945 4.850554 4.506945
如果我这样做
do_and_rename(df['col1'], df.index)
我得到了
0 0.627521
1 1.470450
2 2.015760
3 3.645810
4 4.850554
Name: col1_change, dtype: float64
正确的名称。如何将这些返回的名称用作列标题?
答案 0 :(得分:2)
对我来说工作:
new_df = pd.concat([df] + [do_and_rename(df[x], df.index) for x in df], axis=1)
print (new_df)
col1 col2 col1_change col2_change
0 0.364028 0.694481 0.364028 0.694481
1 0.457195 0.813740 1.457195 1.813740
2 0.286694 0.133999 2.286694 2.133999
3 0.130283 0.398216 3.130283 3.398216
4 0.694586 0.936815 4.694586 4.936815
答案 1 :(得分:1)
如果您不想制作新的DataFrame,可以这样做:
for col in df:
df[col + '_change'] = df[col] + df.index
答案 2 :(得分:1)
您可以使用df.join(your_func(df, args ...,).add_suffix('_change'))
模式。其中,your_func
会返回您修改后的dataframe
In [1459]: def your_func(df, s):
...: dff = df.add(s, axis=0)
...: return dff
...:
In [1460]: df.join(your_func(df, df.index.values).add_suffix('_change'))
Out[1460]:
col1 col2 col1_change col2_change
0 0.627521 0.026832 0.627521 0.026832
1 0.470450 0.319736 1.470450 1.319736
2 0.015760 0.484664 2.015760 2.484664
3 0.645810 0.733688 3.645810 3.733688
4 0.850554 0.506945 4.850554 4.506945
In [1461]: df
Out[1461]:
col1 col2
0 0.627521 0.026832
1 0.470450 0.319736
2 0.015760 0.484664
3 0.645810 0.733688
4 0.850554 0.506945