Python pandas合并或连接数据帧

时间:2015-08-31 08:16:36

标签: python pandas

我有一系列csv,我加载到数据帧并存储在列表中(dataframesArray)。列表和数据框如下所示:

    dataframesArray [            
    BBG.XAMS.UL.S_pnl_pos_cost
        date                                  
        2015-03-23                    0.000000
        2015-03-24                    0.000000
        2015-03-25                   -0.674717
        2015-03-26                   69.140999
        2015-03-27                  -70.128728,             
    BBG.XAMS.UNA.S_pnl_pos_cost
        date                                   
        2015-03-23                    -0.674929
        2015-03-24                   -15.138444
        2015-03-25                    90.830662
        2015-03-26                    21.446129
        2015-03-27                    -2.554376,             
    BBG.XAMS.UL.S_pnl_pos_cost
        date                                  
        2014-10-20                  -15.220730
        2014-10-21                 3031.610010
        2014-10-22                 1976.815412
        2014-10-23                -2974.037294
        2014-10-24                  796.775000,
   BBG.XAMS.UNA.S_pnl_pos_cost
        date                                   
        2014-10-20                    -4.140378
        2014-10-21                   618.064066
        2014-10-22                   -71.104800
        2014-10-23                   828.063647
        2014-10-24                     0.000000]

该数据是针对2个产品(BBG.XAMS.UL.S_pnl_pos_cost和BBG.XAMS.UNA.S_pnl_pos_cost)的日期,将来会有更多的产品。我想将数据帧列表连接或合并(不确定哪个)到一个数据框(称为结果),所以它们看起来像:

            BBG.XAMS.UL.S_pnl_pos_cost  BBG.XAMS.UNA.S_pnl_pos_cost  date                                                                  
2014-10-20                 -15.220730                     -4.140378   
2014-10-21                3031.610010                    618.064066   
2014-10-22                1976.815412                    -71.104800   
2014-10-23               -2974.037294                    828.063647   
2014-10-24                 796.775000                      0.000000   
2015-03-23                    0.000000                    -0.674929   
2015-03-24                    0.000000                   -15.138444   
2015-03-25                   -0.674717                    90.830662   
2015-03-26                   69.140999                    21.446129   
2015-03-27                  -70.128728                    -2.554376  

我试图用以下方法做到这一点:

result = pd.concat(dataframesArray,axis=1)

其中axis是日期。看起来数据按日期合并,但我错过了2015-03-23开始的一周的数据。我目前的concat结果数据框如下所示:

BBG.XAMS.UL.S_pnl_pos_cost  BBG.XAMS.UNA.S_pnl_pos_cost  
date                                                                 
2014-10-20                  -15.220730                    -4.140378  
2014-10-21                 3031.610010                   618.064066  
2014-10-22                 1976.815412                   -71.104800  
2014-10-23                -2974.037294                   828.063647  
2014-10-24                  796.775000                     0.000000  
2015-03-23                         NaN                          NaN  
2015-03-24                         NaN                          NaN  
2015-03-25                         NaN                          NaN  
2015-03-26                         NaN                          NaN  
2015-03-27                         NaN                          NaN  

目前我的代码是:

stockPricesDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,usecols=(0,3))

                stockPricesDf.rename(columns={'adjusted_last_acc': row},inplace=True)    

                dataframesArray.append(stockPricesDf) 

                result = pd.concat(dataframesArray,axis=1)

我正在循环一些目录以获取存储在csv文件中的产品数据。

有人可以让我知道我做错了什么以及如何解决

非常感谢

1 个答案:

答案 0 :(得分:2)

试试这个:

result = pd.concat(dataframesArray, axis=1) # like you did
result = result.groupby(result.columns, axis=1).sum()

如您所见,第一步是这样做(编号):

                  UL       UNA        UL       UNA
2015-03-23  2.169534  0.294107       NaN       NaN
2015-03-24 -0.077550 -0.758760       NaN       NaN
2015-03-25  0.159659 -3.167541       NaN       NaN
2015-03-26  0.895535  0.944644       NaN       NaN
2015-03-27 -0.385408 -0.005069       NaN       NaN
2015-10-20       NaN       NaN  1.855446 -0.229635
2015-10-21       NaN       NaN -0.400450 -0.237323
2015-10-22       NaN       NaN  1.103165  0.718134
2015-10-23       NaN       NaN -0.157415  1.119828
2015-10-24       NaN       NaN -0.016321 -0.371061

第二步将将共享相同名称的列分组到一列中:

                  UL       UNA
2015-03-23  2.169534  0.294107
2015-03-24 -0.077550 -0.758760
2015-03-25  0.159659 -3.167541
2015-03-26  0.895535  0.944644
2015-03-27 -0.385408 -0.005069
2015-10-20  1.855446 -0.229635
2015-10-21 -0.400450 -0.237323
2015-10-22  1.103165  0.718134
2015-10-23 -0.157415  1.119828
2015-10-24 -0.016321 -0.371061