实际.CSV数据文件。
Date MTM_B7_1 Date MTM_B7_11 Date MTM_B7_12
03/01/11 AM 2084 04/01/11 AM -8166 04/01/11 AM -8332
04/01/11 AM -9066 05/01/11 AM 28613 05/01/11 AM -8750
05/01/11 AM 103607 06/01/11 AM 35605 06/01/11 AM -21307
10/01/11 AM 68538 07/01/11 AM 55608 07/01/11 AM 151609
可重现的例子:
df = pd.DataFrame({'Date': pd.date_range('01/03/11',periods=8, format=' %d/%m/%y %p'),
'MTM1': [2, 3, 4, 5, 6, 7, 8, 9],
'Date1': pd.date_range('01/07/11',periods=8, format=' %d/%m/%y %p'),
'MTM2': [4, 3, 5, -7, 6, -9, 8, 9]
})
我有一个上述格式的.csv文件,我希望连接所有Date列,并根据各自的Date索引保留MTM列值,并取出其余的Date列,以便为所有日期索引提供一个日期索引MTM。
答案 0 :(得分:1)
考虑按日期/ MTM对将数据框拆分为数据框列表,然后将合并外部联接与reduce()
链接:
from functools import reduce
import pandas as pd
df = pd.DataFrame({'Date': pd.date_range('01/03/11',periods=8, format=' %d/%m/%y'),
'MTM1': [2, 3, 4, 5, 6, 7, 8, 9],
'Date1': pd.date_range('01/07/11',periods=8, format=' %d/%m/%y'),
'MTM2': [4, 3, 5, -7, 6, -9, 8, 9]
})
# FOR POSTED TWO PAIRING DATA
dfList = [df[['Date', 'MTM1']],
df[['Date1','MTM2']].rename(columns={'Date1':'Date'})]
# FOR MANY PAIRINGS (ASSUMING ORDER AND MATCHES INTACT)
dfList = [df[[d, m]].rename(columns={d:'Date'})\
for d, m in zip([c for c in df.columns if 'Date' in c],
[c for c in df.columns if 'MTM' in c])]
df = reduce(lambda left, right: pd.merge(left, right, on=['Date'], how='outer'), dfList).set_index('Date')
print(df)
# MTM1 MTM2
# Date
# 2011-01-03 2.0 NaN
# 2011-01-04 3.0 NaN
# 2011-01-05 4.0 NaN
# 2011-01-06 5.0 NaN
# 2011-01-07 6.0 4.0
# 2011-01-08 7.0 3.0
# 2011-01-09 8.0 5.0
# 2011-01-10 9.0 -7.0
# 2011-01-11 NaN 6.0
# 2011-01-12 NaN -9.0
# 2011-01-13 NaN 8.0
# 2011-01-14 NaN 9.0