如何将多个日期列合并为一个大熊猫?

时间:2018-09-25 10:38:11

标签: pandas dataframe

我有下面的数据框,其中包含多个日期列及其值:

date         value_1      date        value_2    date         value_3
01-01-1990   1            01-01-1990  2          02-01-1990   4
02-01-1990   3            03-01-1990  20         
                          04-01-1990  30  

输出:将所有日期列合并为超集日期列,并相应显示值。

date         value_1        value_2    value_3
01-01-1990   1              2          
02-01-1990   3                         4
03-01-1990                  20
04-01-1990                  30 

2 个答案:

答案 0 :(得分:3)

首先需要对具有值列的日期对重复的相同列名称进行重复数据删除:

s = df.columns.to_series()
mask = df.columns.duplicated(keep=False)
c = np.where(mask, s + '_'  + (s.groupby(s).cumcount() + 1).astype(str) , s)
df.columns = c
print (df)
       date_1  value_1      date_2  value_2      date_3  value_3
0  01-01-1990      1.0  01-01-1990        2  02-01-1990      4.0
1  02-01-1990      3.0  03-01-1990       20         NaN      NaN
2         NaN      NaN  04-01-1990       30         NaN      NaN

然后使用lambda函数ans按groupby循环,将其成对划分,创建日期列,删除缺失值,最后一起concat

dfs = [x.set_index(x.columns[0]).dropna() 
       for i, x in df.groupby(lambda x: x.split('_')[1], axis=1)]
#print (dfs)

df2 = pd.concat(dfs, axis=1)
print (df2)
            value_1  value_2  value_3
01-01-1990      1.0      2.0      NaN
02-01-1990      3.0      NaN      4.0
03-01-1990      NaN     20.0      NaN
04-01-1990      NaN     30.0      NaN

编辑:

日期时间列和接下来的2个数据值列的答案已更改:

print (df)
  date_security  GH_LAST_PRICE Val  GH_VOLUME_PRICE Val date_security  \
0    01-01-1990                1.0                  7.0    01-01-1990   
1    01-02-1990                3.0                  8.0    03-01-1990   
2           NaN                NaN                  NaN    04-01-1990   

   DG_LAST_PRICE Val  DG_VOLUME_PRICE Val  
0                  2                 10.0  
1                 20                  NaN  
2                 30                  1.0 

创建MultiIndex

df.columns = [(np.arange(len(df.columns)) // 3).astype(str), df.columns]
print (df)
#              0                                                   1  \
  date_security GH_LAST_PRICE Val GH_VOLUME_PRICE Val date_security   
0    01-01-1990               1.0                 7.0    01-01-1990   
1    01-02-1990               3.0                 8.0    03-01-1990   
2           NaN               NaN                 NaN    04-01-1990   


  DG_LAST_PRICE Val DG_VOLUME_PRICE Val  
0                 2                10.0  
1                20                 NaN  
2                30                 1.0  

dfs = [x.set_index(x.columns[0]).dropna()
       for i, x in df.groupby(level=0, axis=1)]


df2 = pd.concat(dfs, axis=1)
#flatten MultiIndex
df2.columns = df2.columns.map('_'.join)
print (df2)
            0_GH_LAST_PRICE Val  0_GH_VOLUME_PRICE Val  1_DG_LAST_PRICE Val  \
01-01-1990                  1.0                    7.0                  2.0   
01-02-1990                  3.0                    8.0                  NaN   
04-01-1990                  NaN                    NaN                 30.0   

            1_DG_VOLUME_PRICE Val  
01-01-1990                   10.0  
01-02-1990                    NaN  
04-01-1990                    1.0  

答案 1 :(得分:0)

一种方法是将日期/值对垂直堆叠到数据框中

df.columns = ['date_1', 'value_1', 'date_2', 'value_2', 'date_3', 'value_3']

>>> new_df = pd.concat([df[['date_1', 'value_1']].rename(columns={'date_1': 'date'}), df[['date_2', 'value_2']].rename(
columns={'date_2': 'date'}), df[['date_3', 'value_3']].rename(columns={'date_3': 'date'})]).dropna(how='all')

         date  value_1  value_2  value_3
0  01-01-1990      1.0      NaN      NaN
1  02-01-1990      3.0      NaN      NaN
2  01-01-1990      NaN      2.0      NaN
3  03-01-1990      NaN     20.0      NaN
4  04-01-1990      NaN     30.0      NaN
5  02-01-1990      NaN      NaN      4.0

然后进行分组

new_df.groupby('date',as_index=False).apply(lambda x:x.ffill().bfill().drop_duplicates())

           date  value_1  value_2  value_3
0 0  01-01-1990      1.0      2.0      NaN
1 1  02-01-1990      3.0      NaN      4.0
2 3  03-01-1990      NaN     20.0      NaN
3 4  04-01-1990      NaN     30.0      NaN