同时通过一个函数传递多个数据帧

时间:2016-12-17 11:35:50

标签: python pandas

如何通过func同时传递df10和df20(甚至更多的数据帧)并保留其名称以供进一步使用?

import pandas as pd
import numpy as np

df = pd.DataFrame( {
   'A': ['d','d','d','d','d','d','g','g','g','g','g','g','k','k','k','k','k','k'],
   'B': [5,5,6,4,5,6,-6,7,7,6,-7,7,-8,7,-6,6,-7,50],
   'C': [1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2],
   'S': [2012,2013,2014,2015,2016,2012,2012,2014,2015,2016,2012,2013,2012,2013,2014,2015,2016,2014]     
    } );

df10 = (df.B + df.C).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)

df20 = (df['B'] - df['C']).groupby([df.A, df.S]).agg(['sum','size']).unstack(fill_value=0)

def func(df):
    df1 = df.groupby(level=0, axis=1).sum()
    new_cols= list(zip(df1.columns.get_level_values(0),['total'] * len(df.columns)))
    df1.columns = pd.MultiIndex.from_tuples(new_cols)
    df2 = pd.concat([df1,df], axis=1).sort_index(axis=1).sort_index(axis=1, level=1)
    df2.columns = ['_'.join((col[0], str(col[1]))) for col in df2.columns]
    df2.columns = df2.columns.str.replace('sum_','')
    df2.columns = df2.columns.str.replace('size_','T')
    return df2

编辑,根据请求打印数据框;

打印(DF10) 打印(DF20)

df10:

    sum size
S   2012    2013    2014    2015    2016    2012    2013    2014    2015    2016
A                                       
d   13  6   7   5   6   2   1   1   1   1
g   -11 8   8   8   7   2   1   1   1   1
k   -6  9   48  8   -5  1   1   2   1   1



 df20:

    sum size
S   2012    2013    2014    2015    2016    2012    2013    2014    2015    2016
A                                       
d   9   4   5   3   4   2   1   1   1   1
g   -15 6   6   6   5   2   1   1   1   1
k   -10 5   40  4   -9  1   1   2   1   1

打印输出

1 个答案:

答案 0 :(得分:4)

编辑:可能有更好的方法来做到这一点;我只是觉得我会提出这个建议。如果没有要求,请告诉我,我会删除。

  

如何通过func同时传递df10和df20(甚至更多的数据帧)并保留其名称以供进一步使用?

如果您只想通过ggplot(df_ex, aes(x=address,y="",fill=clas)) + #x axis bias voltage dependence geom_tile() + scale_fill_manual(values=c('Good'="green","Bad"="Blue","Ugly"="black"))+ facet_wrap(~No,ncol=1,scales = "free_x")+ theme(legend.position = "top",axis.text.y = element_text(size = 20,angle = 90),axis.text.x = element_text(size=12,face="bold",colour = "black"), axis.title.y = element_text(face="bold",size = 20, colour = "black"), axis.title.x = element_text(face="bold",size = 20 , colour = "black"), strip.text = element_text(size=26, face="bold"), strip.background = element_rect(fill="#FFFF66", colour="black", size=0.5), plot.title=element_text(face="bold",color="red",size=14), legend.title = element_text(colour="black", size=26,face="bold"), legend.text = element_text(colour="black", size=18))+ labs(x = "address",y = "") 传递多个功能,并且所有数据帧的格式相同,则可能会发生以下情况。

为简单起见,请使用数据帧:

func

和一个简单的功能:

df10 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df20 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
df30 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})

创建原始数据框的列表:

your_func(df):
    #### Perform some action/change to df eg
    df2 = df.head(1)
    return df2

然后,使用for循环通过列表传递每个数据帧,例如这将保持原始数据帧不变。

A = [df10,df20,df30]

A = [   one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0,    
        one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0,    
        one  two
    0  1.0  4.0
    1  2.0  3.0
    2  3.0  2.0
    3  4.0  1.0]

输出:

for i in range(0,len(A)):
    A[i] = your_func(A[i])

因此,现在列表A = [ one two 0 1.0 4.0, one two 0 1.0 4.0, one two 0 1.0 4.0] 包含每个新数据帧。您的原始数据框A df10等保持不变。只需调用df20的元素即可访问您的新数据框。