熊猫创建新的数据框,其中2分钟数据与1分钟对齐

时间:2019-04-11 13:47:57

标签: python pandas

这是相同的csv:

Time,o1,h1,l1,c1,o2,h2,l2,c2
        2019.04.08 23:59:00,1.12582,1.1261,1.12582,1.12609,1.12588,1.12591,1.12587,1.12588
        2019.04.09 00:00:00,1.12587,1.12598,1.12587,1.12598,1.12583,1.1261,1.12582,1.12609
        2019.04.09 00:01:00,1.12599,1.12607,1.12587,1.12607,1.12598,1.12607,1.12587,1.12607
        2019.04.09 00:02:00,1.12607,1.12632,1.12605,1.12632,1.12598,1.12607,1.12587,1.12607
        2019.04.09 00:03:00,1.12603,1.12614,1.12603,1.12613,1.12607,1.12632,1.12603,1.12613
        2019.04.09 00:04:00,1.12613,1.12613,1.12609,1.12611,1.12607,1.12632,1.12603,1.12613
        2019.04.09 00:05:00,1.12612,1.12613,1.12612,1.12613,1.12613,1.12613,1.12609,1.12613
        2019.04.09 00:06:00,1.12613,1.12613,1.12613,1.12613,1.12613,1.12613,1.12609,1.12613
        2019.04.09 00:07:00,1.12613,1.12613,1.1261,1.12612,1.12613,1.12613,1.1261,1.12612
        2019.04.09 00:08:00,1.12612,1.12612,1.12612,1.12612,1.12613,1.12613,1.1261,1.12612
        2019.04.09 00:09:00,1.12612,1.12612,1.12609,1.12609,1.12612,1.12612,1.12609,1.12609
        2019.04.09 00:10:00,1.12609,1.12609,1.12602,1.12607,1.12612,1.12612,1.12609,1.12609
        2019.04.09 00:11:00,1.12607,1.12609,1.12607,1.12608,1.12609,1.12609,1.12602,1.12608
        2019.04.09 00:12:00,1.12608,1.12608,1.12608,1.12608,1.12609,1.12609,1.12602,1.12608
        2019.04.09 00:13:00,1.12608,1.12608,1.126,1.126,1.12608,1.12608,1.126,1.126
        2019.04.09 00:14:00,1.126,1.126,1.1259,1.12592,1.12608,1.12608,1.126,1.126
        2019.04.09 00:15:00,1.12592,1.12614,1.12592,1.12613,1.126,1.12614,1.1259,1.12613
        2019.04.09 00:16:00,1.12614,1.12619,1.12612,1.12619,1.126,1.12614,1.1259,1.12613
        2019.04.09 00:17:00,1.12619,1.1262,1.12615,1.12615,1.12614,1.1262,1.12612,1.12615
        2019.04.09 00:18:00,1.12616,1.12616,1.12608,1.12608,1.12614,1.1262,1.12612,1.12615
        2019.04.09 00:19:00,1.12608,1.12611,1.12608,1.12611,1.12616,1.12616,1.12608,1.12611
        2019.04.09 00:20:00,1.12608,1.12637,1.12608,1.1262,1.12616,1.12616,1.12608,1.12611
        2019.04.09 00:21:00,1.1262,1.12631,1.12614,1.12623,1.12608,1.12637,1.12608,1.12623
        2019.04.09 00:22:00,1.12624,1.12646,1.12619,1.12626,1.12608,1.12637,1.12608,1.12623
        2019.04.09 00:23:00,1.12626,1.12628,1.12585,1.1261,1.12624,1.12646,1.12585,1.1261
        2019.04.09 00:24:00,1.12609,1.12621,1.12607,1.12616,1.12624,1.12646,1.12585,1.1261
        2019.04.09 00:25:00,1.12615,1.12625,1.12609,1.1262,1.12609,1.12625,1.12607,1.1262
        2019.04.09 00:26:00,1.1262,1.12629,1.12615,1.12618,1.12609,1.12625,1.12607,1.1262
        2019.04.09 00:27:00,1.12618,1.12621,1.12614,1.12617,1.1262,1.12629,1.12614,1.12617
        2019.04.09 00:28:00,1.12618,1.12618,1.12586,1.12587,1.1262,1.12629,1.12614,1.12617
        2019.04.09 00:29:00,1.12586,1.12598,1.12586,1.12597,1.12618,1.12618,1.12586,1.12597
        2019.04.09 00:30:00,1.12598,1.12608,1.12594,1.12604,1.12618,1.12618,1.12586,1.12597

我想知道如何在Pandas Dataframe中做到这一点:

Time,o1,h1,l1,c1,o2,h2,l2,c2,o11,h11,l11,c11
2019.04.08 23:59:00,1.12582,1.1261,1.12582,1.12609,1.12588,1.12591,1.12587,1.12588,1.12587,1.12598,1.12587,1.12598
2019.04.09 00:01:00,1.12599,1.12607,1.12587,1.12607,1.12598,1.12607,1.12587,1.12607,1.12607,1.12632,1.12605,1.12632
2019.04.09 00:03:00,1.12603,1.12614,1.12603,1.12613,1.12607,1.12632,1.12603,1.12613,1.12613,1.12613,1.12609,1.12611
2019.04.09 00:05:00,1.12612,1.12613,1.12612,1.12613,1.12613,1.12613,1.12609,1.12613,1.12613,1.12613,1.12613,1.12613
2019.04.09 00:07:00,1.12613,1.12613,1.1261,1.12612,1.12613,1.12613,1.1261,1.12612,1.12612,1.12612,1.12612,1.12612
2019.04.09 00:09:00,1.12612,1.12612,1.12609,1.12609,1.12612,1.12612,1.12609,1.12609,1.12609,1.12609,1.12602,1.12607

我想要将2分钟到1分钟的数据附加到上面的示例中。
请让我知道。

2 个答案:

答案 0 :(得分:2)

Index.get_indexer用于列表中的列位置和DataFrame.iloc中的合适行,最后DataFrame.join一起:

pos2 = df.columns.get_indexer(['o1','h1','l1','c1'])
df1 = df.iloc[0::2].reset_index(drop=True)
df2 = df.iloc[1::2, pos2].reset_index(drop=True)

df = df1.join(df2, rsuffix='1')

print (df)

                   Time       o1       h1       l1       c1       o2       h2  \
0   2019.04.08 23:59:00  1.12582  1.12610  1.12582  1.12609  1.12588  1.12591   
1   2019.04.09 00:01:00  1.12599  1.12607  1.12587  1.12607  1.12598  1.12607   
2   2019.04.09 00:03:00  1.12603  1.12614  1.12603  1.12613  1.12607  1.12632   
3   2019.04.09 00:05:00  1.12612  1.12613  1.12612  1.12613  1.12613  1.12613   
4   2019.04.09 00:07:00  1.12613  1.12613  1.12610  1.12612  1.12613  1.12613   
5   2019.04.09 00:09:00  1.12612  1.12612  1.12609  1.12609  1.12612  1.12612   
6   2019.04.09 00:11:00  1.12607  1.12609  1.12607  1.12608  1.12609  1.12609   
7   2019.04.09 00:13:00  1.12608  1.12608  1.12600  1.12600  1.12608  1.12608   
8   2019.04.09 00:15:00  1.12592  1.12614  1.12592  1.12613  1.12600  1.12614   
9   2019.04.09 00:17:00  1.12619  1.12620  1.12615  1.12615  1.12614  1.12620   
10  2019.04.09 00:19:00  1.12608  1.12611  1.12608  1.12611  1.12616  1.12616   
11  2019.04.09 00:21:00  1.12620  1.12631  1.12614  1.12623  1.12608  1.12637   
12  2019.04.09 00:23:00  1.12626  1.12628  1.12585  1.12610  1.12624  1.12646   
13  2019.04.09 00:25:00  1.12615  1.12625  1.12609  1.12620  1.12609  1.12625   
14  2019.04.09 00:27:00  1.12618  1.12621  1.12614  1.12617  1.12620  1.12629   
15  2019.04.09 00:29:00  1.12586  1.12598  1.12586  1.12597  1.12618  1.12618   

         l2       c2      o11      h11      l11      c11  
0   1.12587  1.12588  1.12587  1.12598  1.12587  1.12598  
1   1.12587  1.12607  1.12607  1.12632  1.12605  1.12632  
2   1.12603  1.12613  1.12613  1.12613  1.12609  1.12611  
3   1.12609  1.12613  1.12613  1.12613  1.12613  1.12613  
4   1.12610  1.12612  1.12612  1.12612  1.12612  1.12612  
5   1.12609  1.12609  1.12609  1.12609  1.12602  1.12607  
6   1.12602  1.12608  1.12608  1.12608  1.12608  1.12608  
7   1.12600  1.12600  1.12600  1.12600  1.12590  1.12592  
8   1.12590  1.12613  1.12614  1.12619  1.12612  1.12619  
9   1.12612  1.12615  1.12616  1.12616  1.12608  1.12608  
10  1.12608  1.12611  1.12608  1.12637  1.12608  1.12620  
11  1.12608  1.12623  1.12624  1.12646  1.12619  1.12626  
12  1.12585  1.12610  1.12609  1.12621  1.12607  1.12616  
13  1.12607  1.12620  1.12620  1.12629  1.12615  1.12618  
14  1.12614  1.12617  1.12618  1.12618  1.12586  1.12587  
15  1.12586  1.12597  1.12598  1.12608  1.12594  1.12604  

编辑:通过DataFrame.asfreq以一分钟的频率添加缺失的行:

print (df)
                  Time       o1       h1       l1       c1       o2       h2  \
0  2019.04.10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1  2019.04.10 16:17:00  1.12327  1.12333  1.12302  1.12322  1.12308  1.12340   
2  2019.04.10 16:21:00  1.12314  1.12354  1.12313  1.12353  1.12314  1.12354   
3  2019.04.10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
4  2019.04.10 16:23:00  1.12359  1.12389  1.12357  1.12383  1.12353  1.12389   
5  2019.04.10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
6  2019.04.10 16:25:00  1.12383  1.12389  1.12373  1.12376  1.12383  1.12396   
7  2019.04.10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   

        l2       c2  
0  1.12300  1.12308  
1  1.12302  1.12322  
2  1.12313  1.12353  
3  1.12313  1.12353  
4  1.12345  1.12383  
5  1.12345  1.12383  
6  1.12373  1.12376  
7  1.12373  1.12376  

df['Time'] = pd.to_datetime(df['Time'])
df = df.set_index('Time').asfreq('1min').reset_index()
#if need forward filling missing values
#df = df.set_index('Time').asfreq('1min', method='ffill').reset_index()
print (df)
                  Time       o1       h1       l1       c1       o2       h2  \
0  2019-04-10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1  2019-04-10 16:17:00  1.12327  1.12333  1.12302  1.12322  1.12308  1.12340   
2  2019-04-10 16:18:00      NaN      NaN      NaN      NaN      NaN      NaN   
3  2019-04-10 16:19:00      NaN      NaN      NaN      NaN      NaN      NaN   
4  2019-04-10 16:20:00      NaN      NaN      NaN      NaN      NaN      NaN   
5  2019-04-10 16:21:00  1.12314  1.12354  1.12313  1.12353  1.12314  1.12354   
6  2019-04-10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
7  2019-04-10 16:23:00  1.12359  1.12389  1.12357  1.12383  1.12353  1.12389   
8  2019-04-10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
9  2019-04-10 16:25:00  1.12383  1.12389  1.12373  1.12376  1.12383  1.12396   
10 2019-04-10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   

         l2       c2  
0   1.12300  1.12308  
1   1.12302  1.12322  
2       NaN      NaN  
3       NaN      NaN  
4       NaN      NaN  
5   1.12313  1.12353  
6   1.12313  1.12353  
7   1.12345  1.12383  
8   1.12345  1.12383  
9   1.12373  1.12376  
10  1.12373  1.12376 

pos2 = df.columns.get_indexer(['o1','h1','l1','c1'])
df1 = df.iloc[0::2].reset_index(drop=True)
df2 = df.iloc[1::2, pos2].reset_index(drop=True)

df = df1.join(df2, rsuffix='1')
print (df)
                 Time       o1       h1       l1       c1       o2       h2  \
0 2019-04-10 16:16:00  1.12308  1.12340  1.12304  1.12327  1.12350  1.12369   
1 2019-04-10 16:18:00      NaN      NaN      NaN      NaN      NaN      NaN   
2 2019-04-10 16:20:00      NaN      NaN      NaN      NaN      NaN      NaN   
3 2019-04-10 16:22:00  1.12353  1.12362  1.12345  1.12359  1.12314  1.12354   
4 2019-04-10 16:24:00  1.12383  1.12396  1.12375  1.12384  1.12353  1.12389   
5 2019-04-10 16:26:00  1.12376  1.12386  1.12355  1.12361  1.12383  1.12396   

        l2       c2      o11      h11      l11      c11  
0  1.12300  1.12308  1.12327  1.12333  1.12302  1.12322  
1      NaN      NaN      NaN      NaN      NaN      NaN  
2      NaN      NaN  1.12314  1.12354  1.12313  1.12353  
3  1.12313  1.12353  1.12359  1.12389  1.12357  1.12383  
4  1.12345  1.12383  1.12383  1.12389  1.12373  1.12376  
5  1.12373  1.12376      NaN      NaN      NaN      NaN  

答案 1 :(得分:0)

分割数据然后加入:

# split data every other row
n = df[0::2]
n1 = df[1::2]

# reset_index and join
new_df = n.reset_index(drop=True).join(n1.reset_index(drop=True), rsuffix='_other')

然后删除您不想要的列