我该怎么做?我在.csv文件中有以下数据集:
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
| Date | NBDG LN Equity | Date | P2P LN Equity | Date | HWSL LN Equity | Date | BPCR LN Equity | Date | AXI LN Equity |
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
| 09-08-2017 | 78,5 | 09-08-2017 | 877,061 | 09-08-2017 | 107,082 | 09-08-2017 | 1,0981 | 08-08-2017 | 94 |
| 08-08-2017 | 78,5 | 08-08-2017 | 878,7899 | 08-08-2017 | 106,5 | 08-08-2017 | 1,1021 | 07-08-2017 | 94 |
| 03-08-2017 | 78,5 | 07-08-2017 | 879,709 | 07-08-2017 | 106,2 | 07-08-2017 | 1,0945 | 02-08-2017 | 98,2472 |
| 01-08-2017 | 78,5 | 04-08-2017 | 879,6708 | 04-08-2017 | 105,4882 | 04-08-2017 | 1,0932 | 27-07-2017 | 98,5 |
+------------+----------------+------------+---------------+------------+----------------+------------+----------------+------------+---------------+
我想"合并"进入格式:
+------------+----------------+---------------+----------------+----------------+---------------+
| Date | NBDG LN Equity | P2P LN Equity | HWSL LN Equity | BPCR LN Equity | AXI LN Equity |
+------------+----------------+---------------+----------------+----------------+---------------+
| 09-08-2017 | 78,5 | 877,061 | 107,082 | 1,0981 | NA |
| 08-08-2017 | 78,5 | 878,7899 | 106,5 | 1,1021 | 94 |
| 07-08-2017 | NA | 879,709 | 106,2 | 1,0945 | 94 |
| 04-08-2017 | NA | 879,6708 | 105,4882 | 1,0932 | NA |
| 03-08-2017 | 78,5 | NA | NA | NA | NA |
| 02-08-2017 | NA | NA | NA | NA | 98,2472 |
| 01-08-2017 | 78,5 | NA | NA | NA | NA |
| 27-07-2017 | NA | NA | NA | NA | 98,5 |
+------------+----------------+---------------+----------------+----------------+---------------+
如果没有太多硬编码,我怎么能这样做?我开始使用
按唯一行排序dfData = local_csv('Data.csv', timezone='DK', sep=';')
lDateColumns = [col for col in dfData.columns if 'Date' in col]
dfData[dfData[lDateColumns].apply(pd.Series.nunique, axis=1)==1]
直到我注意到索引有时相对于彼此偏移,导致只剩下4行。
由于
答案 0 :(得分:0)
我逐个分解数据帧(更确切地说,2列2列),然后将所有内容合并在一起:
In [103]: df
Out[103]:
Date NBDG LN Equity Date.1 P2P LN Equity Date.2 \
0 09-08-2017 78,5 09-08-2017 877,061 09-08-2017
1 08-08-2017 78,5 08-08-2017 878,7899 08-08-2017
2 03-08-2017 78,5 07-08-2017 879,709 07-08-2017
3 01-08-2017 78,5 04-08-2017 879,6708 04-08-2017
HWSL LN Equity Date.3 BPCR LN Equity Date.4 AXI LN Equity
0 107,082 09-08-2017 1,0981 08-08-2017 94
1 106,5 08-08-2017 1,1021 07-08-2017 94
2 106,2 07-08-2017 1,0945 02-08-2017 98,2472
3 105,4882 04-08-2017 1,0932 27-07-2017 98,5
In [114]: res = []
In [115]: for i in range(5):
...: df_temp = pd.concat([df.iloc[:, 2*i], df.iloc[:, 2*i+1]], axis=1)
...: df_temp.columns = ['Date', df_temp.columns[1]]
...: res.append(df_temp)
...:
我们现在有一个数据帧数组,其第一列始终是日期(并称为“日期”),第二列是相关指标。我们将使用functools.reduce
In [117]: from functools import reduce
In [120]: reduce(lambda df1,df2: df1.merge(df2, on='Date', how='outer'), res)
Out[120]:
Date NBDG LN Equity P2P LN Equity HWSL LN Equity BPCR LN Equity \
0 09-08-2017 78,5 877,061 107,082 1,0981
1 08-08-2017 78,5 878,7899 106,5 1,1021
2 03-08-2017 78,5 NaN NaN NaN
3 01-08-2017 78,5 NaN NaN NaN
4 07-08-2017 NaN 879,709 106,2 1,0945
5 04-08-2017 NaN 879,6708 105,4882 1,0932
6 02-08-2017 NaN NaN NaN NaN
7 27-07-2017 NaN NaN NaN NaN
AXI LN Equity
0 NaN
1 94
2 NaN
3 NaN
4 94
5 NaN
6 98,2472
7 98,5