我有一系列csv,我加载到数据帧并存储在列表中(dataframesArray)。列表和数据框如下所示:
dataframesArray [
BBG.XAMS.UL.S_pnl_pos_cost
date
2015-03-23 0.000000
2015-03-24 0.000000
2015-03-25 -0.674717
2015-03-26 69.140999
2015-03-27 -70.128728,
BBG.XAMS.UNA.S_pnl_pos_cost
date
2015-03-23 -0.674929
2015-03-24 -15.138444
2015-03-25 90.830662
2015-03-26 21.446129
2015-03-27 -2.554376,
BBG.XAMS.UL.S_pnl_pos_cost
date
2014-10-20 -15.220730
2014-10-21 3031.610010
2014-10-22 1976.815412
2014-10-23 -2974.037294
2014-10-24 796.775000,
BBG.XAMS.UNA.S_pnl_pos_cost
date
2014-10-20 -4.140378
2014-10-21 618.064066
2014-10-22 -71.104800
2014-10-23 828.063647
2014-10-24 0.000000]
该数据是针对2个产品(BBG.XAMS.UL.S_pnl_pos_cost和BBG.XAMS.UNA.S_pnl_pos_cost)的日期,将来会有更多的产品。我想将数据帧列表连接或合并(不确定哪个)到一个数据框(称为结果),所以它们看起来像:
BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost date
2014-10-20 -15.220730 -4.140378
2014-10-21 3031.610010 618.064066
2014-10-22 1976.815412 -71.104800
2014-10-23 -2974.037294 828.063647
2014-10-24 796.775000 0.000000
2015-03-23 0.000000 -0.674929
2015-03-24 0.000000 -15.138444
2015-03-25 -0.674717 90.830662
2015-03-26 69.140999 21.446129
2015-03-27 -70.128728 -2.554376
我试图用以下方法做到这一点:
result = pd.concat(dataframesArray,axis=1)
其中axis是日期。看起来数据按日期合并,但我错过了2015-03-23开始的一周的数据。我目前的concat结果数据框如下所示:
BBG.XAMS.UL.S_pnl_pos_cost BBG.XAMS.UNA.S_pnl_pos_cost
date
2014-10-20 -15.220730 -4.140378
2014-10-21 3031.610010 618.064066
2014-10-22 1976.815412 -71.104800
2014-10-23 -2974.037294 828.063647
2014-10-24 796.775000 0.000000
2015-03-23 NaN NaN
2015-03-24 NaN NaN
2015-03-25 NaN NaN
2015-03-26 NaN NaN
2015-03-27 NaN NaN
目前我的代码是:
stockPricesDf=pd.read_csv(f,engine='c',header=0,index_col=0, parse_dates=True, infer_datetime_format=True,usecols=(0,3))
stockPricesDf.rename(columns={'adjusted_last_acc': row},inplace=True)
dataframesArray.append(stockPricesDf)
result = pd.concat(dataframesArray,axis=1)
我正在循环一些目录以获取存储在csv文件中的产品数据。
有人可以让我知道我做错了什么以及如何解决
非常感谢
答案 0 :(得分:2)
试试这个:
result = pd.concat(dataframesArray, axis=1) # like you did
result = result.groupby(result.columns, axis=1).sum()
如您所见,第一步是这样做(编号):
UL UNA UL UNA
2015-03-23 2.169534 0.294107 NaN NaN
2015-03-24 -0.077550 -0.758760 NaN NaN
2015-03-25 0.159659 -3.167541 NaN NaN
2015-03-26 0.895535 0.944644 NaN NaN
2015-03-27 -0.385408 -0.005069 NaN NaN
2015-10-20 NaN NaN 1.855446 -0.229635
2015-10-21 NaN NaN -0.400450 -0.237323
2015-10-22 NaN NaN 1.103165 0.718134
2015-10-23 NaN NaN -0.157415 1.119828
2015-10-24 NaN NaN -0.016321 -0.371061
第二步将将共享相同名称的列分组到一列中:
UL UNA
2015-03-23 2.169534 0.294107
2015-03-24 -0.077550 -0.758760
2015-03-25 0.159659 -3.167541
2015-03-26 0.895535 0.944644
2015-03-27 -0.385408 -0.005069
2015-10-20 1.855446 -0.229635
2015-10-21 -0.400450 -0.237323
2015-10-22 1.103165 0.718134
2015-10-23 -0.157415 1.119828
2015-10-24 -0.016321 -0.371061