根据每个其他列的值合并大型数据框

时间:2017-09-01 12:18:55

标签: python pandas

我该怎么做?我在.csv文件中有以下数据集:

    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
    |    Date    | NBDG LN Equity |    Date    | P2P LN Equity |    Date    | HWSL LN Equity |    Date    | BPCR LN Equity |    Date    | AXI LN Equity |
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
    | 09-08-2017 |           78,5 | 09-08-2017 |       877,061 | 09-08-2017 |        107,082 | 09-08-2017 |         1,0981 | 08-08-2017 |            94 |
    | 08-08-2017 |           78,5 | 08-08-2017 |      878,7899 | 08-08-2017 |          106,5 | 08-08-2017 |         1,1021 | 07-08-2017 |            94 |
    | 03-08-2017 |           78,5 | 07-08-2017 |       879,709 | 07-08-2017 |          106,2 | 07-08-2017 |         1,0945 | 02-08-2017 |       98,2472 |
    | 01-08-2017 |           78,5 | 04-08-2017 |      879,6708 | 04-08-2017 |       105,4882 | 04-08-2017 |         1,0932 | 27-07-2017 |          98,5 |
    +------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+

我想"合并"进入格式:

+------------+----------------+---------------+----------------+----------------+---------------+
|    Date    | NBDG LN Equity | P2P LN Equity | HWSL LN Equity | BPCR LN Equity | AXI LN Equity |
+------------+----------------+---------------+----------------+----------------+---------------+
| 09-08-2017 | 78,5           | 877,061       | 107,082        | 1,0981         | NA            |
| 08-08-2017 | 78,5           | 878,7899      | 106,5          | 1,1021         | 94            |
| 07-08-2017 | NA             | 879,709       | 106,2          | 1,0945         | 94            |
| 04-08-2017 | NA             | 879,6708      | 105,4882       | 1,0932         | NA            |
| 03-08-2017 | 78,5           | NA            | NA             | NA             | NA            |
| 02-08-2017 | NA             | NA            | NA             | NA             | 98,2472       |
| 01-08-2017 | 78,5           | NA            | NA             | NA             | NA            |
| 27-07-2017 | NA             | NA            | NA             | NA             | 98,5          |
+------------+----------------+---------------+----------------+----------------+---------------+

如果没有太多硬编码,我怎么能这样做?我开始使用

按唯一行排序
dfData = local_csv('Data.csv', timezone='DK', sep=';')
lDateColumns = [col for col in dfData.columns if 'Date' in col]
dfData[dfData[lDateColumns].apply(pd.Series.nunique, axis=1)==1]

直到我注意到索引有时相对于彼此偏移,导致只剩下4行。

由于

1 个答案:

答案 0 :(得分:0)

我逐个分解数据帧(更确切地说,2列2列),然后将所有内容合并在一起:

In [103]: df
Out[103]: 
         Date NBDG LN Equity      Date.1 P2P LN Equity      Date.2  \
0  09-08-2017           78,5  09-08-2017       877,061  09-08-2017   
1  08-08-2017           78,5  08-08-2017      878,7899  08-08-2017   
2  03-08-2017           78,5  07-08-2017       879,709  07-08-2017   
3  01-08-2017           78,5  04-08-2017      879,6708  04-08-2017   

  HWSL LN Equity      Date.3 BPCR LN Equity      Date.4 AXI LN Equity  
0        107,082  09-08-2017         1,0981  08-08-2017            94  
1          106,5  08-08-2017         1,1021  07-08-2017            94  
2          106,2  07-08-2017         1,0945  02-08-2017       98,2472  
3       105,4882  04-08-2017         1,0932  27-07-2017          98,5

In [114]: res = []

In [115]: for i in range(5):
     ...:     df_temp = pd.concat([df.iloc[:, 2*i], df.iloc[:, 2*i+1]], axis=1)
     ...:     df_temp.columns = ['Date', df_temp.columns[1]]
     ...:     res.append(df_temp)
     ...:     

我们现在有一个数据帧数组,其第一列始终是日期(并称为“日期”),第二列是相关指标。我们将使用functools.reduce

将所有内容合并在一起
In [117]: from functools import reduce

In [120]: reduce(lambda df1,df2: df1.merge(df2, on='Date', how='outer'), res)
Out[120]: 
         Date NBDG LN Equity P2P LN Equity HWSL LN Equity BPCR LN Equity  \
0  09-08-2017           78,5       877,061        107,082         1,0981   
1  08-08-2017           78,5      878,7899          106,5         1,1021   
2  03-08-2017           78,5           NaN            NaN            NaN   
3  01-08-2017           78,5           NaN            NaN            NaN   
4  07-08-2017            NaN       879,709          106,2         1,0945   
5  04-08-2017            NaN      879,6708       105,4882         1,0932   
6  02-08-2017            NaN           NaN            NaN            NaN   
7  27-07-2017            NaN           NaN            NaN            NaN   

  AXI LN Equity  
0           NaN  
1            94  
2           NaN  
3           NaN  
4            94  
5           NaN  
6       98,2472  
7          98,5