将Dataframe中的值插入另一个Dataframe,这两个DF都在日期

时间:2016-10-04 21:41:13

标签: pandas

我有df1

                     GOOG  AAPL  XOM  IBM  _CASH
2011-01-13 16:00:00     0     0    0    0      0
2011-01-14 16:00:00     0     0    0    0      0
2011-01-18 16:00:00     0     0    0    0      0
2011-01-19 16:00:00     0     0    0    0      0
2011-01-20 16:00:00     0     0    0    0      0
...

df2

    year  month  day symbol order_type  shares  trans_date
0   2011      1   13   AAPL       Sell    1500  2011-01-13
1   2011      1   13    IBM        Buy    4000  2011-01-13
2   2011      1   26   GOOG        Buy    1000  2011-01-26
3   2011      2    2    XOM       Sell    4000  2011-02-02
4   2011      2   10    XOM        Buy    4000  2011-02-10
5   2011      3    3   GOOG       Sell    1000  2011-03-03
6   2011      3    3    IBM       Sell    2200  2011-03-03
7   2011      6    3    IBM       Sell    3300  2011-06-03
8   2011      5    3    IBM        Buy    1500  2011-05-03
9   2011      6   10   AAPL        Buy    1200  2011-06-10
10  2011      8    1   GOOG        Buy      55  2011-08-01
11  2011      8    1   GOOG       Sell      55  2011-08-01
12  2011     12   20   AAPL       Sell    1200  2011-12-20

我想插入df2'股票'值匹配到匹配的df1日期行和df1匹配符号列。分享'的标志。对于“购买”而言价值将是积极的。 ' ORDER_TYPE'对于' Sell'

是否定的

例如,对于df2的第一行,-1500的值将被插入' AAPL' df1的列,df2上匹配的行' trans_date'和df1时间戳。

df2' trans_date'并且df1时间戳已编入索引。

我很感谢你的帮助。

我使用这个堆叠df1并与df2合并:

pd.merge(df_stacked_mx,df_orders,left_index = True,right_index = True,how =' inner')

我明白了:

 level_0             level_1  0  year  month  day symbol order_type  shares  \
0  2011-01-13 16:00:00    GOOG  0  2011      1   13   AAPL       Sell    1500   
1  2011-01-13 16:00:00    AAPL  0  2011      1   13    IBM        Buy    4000   
2  2011-01-13 16:00:00     XOM  0  2011      1   26   GOOG        Buy    1000   
3  2011-01-13 16:00:00     IBM  0  2011      2    2    XOM       Sell    4000   
4  2011-01-13 16:00:00   _CASH  0  2011      2   10    XOM        Buy    4000   
5  2011-01-14 16:00:00    GOOG  0  2011      3    3   GOOG       Sell    1000   
6  2011-01-14 16:00:00    AAPL  0  2011      3    3    IBM       Sell    2200   
7  2011-01-14 16:00:00     XOM  0  2011      6    3    IBM       Sell    3300   
8  2011-01-14 16:00:00     IBM  0  2011      5    3    IBM        Buy    1500   
9  2011-01-14 16:00:00   _CASH  0  2011      6   10   AAPL        Buy    1200   
10 2011-01-18 16:00:00    GOOG  0  2011      8    1   GOOG        Buy      55   
11 2011-01-18 16:00:00    AAPL  0  2011      8    1   GOOG       Sell      55   
12 2011-01-18 16:00:00     XOM  0  2011     12   20   AAPL       Sell    1200   


            trans_date  
0  2011-01-13 16:00:00  
1  2011-01-13 16:00:00  
2  2011-01-26 16:00:00  
3  2011-02-02 16:00:00  
4  2011-02-10 16:00:00  
5  2011-03-03 16:00:00  
6  2011-03-03 16:00:00  
7  2011-06-03 16:00:00  
8  2011-05-03 16:00:00  
9  2011-06-10 16:00:00  
10 2011-08-01 16:00:00  
11 2011-08-01 16:00:00  
12 2011-12-20 16:00:00  

我不知道下次去哪里,我需要帮助。

德鲁

感谢Bob为您的GitHub贡献。非常优雅,没有循环。我需要一个年龄才能得到答案。

有一个问题。堆叠的df1更新返回"无"。指数不匹配:

我的堆叠df2看起来像这样:

trans_date           symbol
2011-01-13 16:00:00  AAPL     -1500
                     IBM       4000
2011-01-26 16:00:00  GOOG      1000
2011-02-02 16:00:00  XOM      -4000
2011-02-10 16:00:00  XOM       4000
2011-03-03 16:00:00  GOOG     -1000
                     IBM      -2200
2011-05-03 16:00:00  IBM       1500
2011-06-03 16:00:00  IBM      -3300
2011-06-10 16:00:00  AAPL      1200
2011-08-01 16:00:00  GOOG         0
2011-12-20 16:00:00  AAPL     -1200

你的是:

trans_date  symbol
2011-01-13  AAPL     -1500
            IBM       4000
2011-01-26  GOOG      1000
2011-02-02  XOM      -4000
2011-02-10  XOM       4000
2011-03-03  GOOG     -1000
            IBM      -2200
2011-05-03  IBM       1500
2011-06-03  IBM      -3300
2011-06-10  AAPL      1200
2011-08-01  GOOG         0
2011-12-20  AAPL     -1200
Name: shares, dtype: int64

请注意我的约会时间。

但我的df1没有时间:

                GOOG  AAPL  XOM  IBM  _CASH
2011-01-13     0     0    0    0      0
2011-01-26     0     0    0    0      0
2011-02-02     0     0    0    0      0
2011-02-10     0     0    0    0      0
2011-03-03     0     0    0    0      0
2011-05-03     0     0    0    0      0
2011-06-03     0     0    0    0      0
2011-06-10     0     0    0    0      0
2011-08-01     0     0    0    0      0
2011-12-20     0     0    0    0      0

这令人困惑。如果我使用开始和结束之间的所有时间戳创建一个df,我会得到时间戳:

                   GOOG  AAPL  XOM  IBM  _CASH
2011-01-13 16:00:00     0     0    0    0      0
2011-01-14 16:00:00     0     0    0    0      0
2011-01-18 16:00:00     0     0    0    0      0
2011-01-19 16:00:00     0     0    0    0      0
2011-01-20 16:00:00     0     0    0    0      0
2011-01-21 16:00:00     0     0    0    0      0
2011-01-24 16:00:00     0     0    0    0      0
2011-01-25 16:00:00     0     0    0    0      0
2011-01-26 16:00:00     0     0    0    0      0
2011-01-27 16:00:00     0     0    0    0      0
2011-01-28 16:00:00     0     0    0    0      0
2011-01-31 16:00:00     0     0    0    0      0
2011-02-01 16:00:00     0     0    0    0      0
2011-02-02 16:00:00     0     0    0    0      0
2011-02-03 16:00:00     0     0    0    0      0
2011-02-04 16:00:00     0     0    0    0      0
2011-02-07 16:00:00     0     0    0    0      0
2011-02-08 16:00:00     0     0    0    0      0
2011-02-09 16:00:00     0     0    0    0      0
2011-02-10 16:00:00     0     0    0    0      0
2011-02-11 16:00:00     0     0    0    0      0
2011-02-14 16:00:00     0     0    0    0      0
2011-02-15 16:00:00     0     0    0    0      0
2011-02-16 16:00:00     0     0    0    0      0
2011-02-17 16:00:00     0     0    0    0      0
2011-02-18 16:00:00     0     0    0    0      0
2011-02-22 16:00:00     0     0    0    0      0
2011-02-23 16:00:00     0     0    0    0      0
2011-02-24 16:00:00     0     0    0    0      0
2011-02-25 16:00:00     0     0    0    0      0
                  ...   ...  ...  ...    ...

但是如果我用df2中的日期创建一个df,那么时间组件会被删除????:

我正处于日期时间地狱,你可能会告诉鲍勃。

一个明显的解决方案是使两个索引都是日期时间格式,但我不知道如何为df1文件执行此操作。另请注意,在df2索引中删除时间组件不是一种选择。此df必须最终与包含开始/结束之间的所有日期时间的较大df合并,即具有日期中的时间的日期时间。

万分感谢你代表我的努力鲍勃。

0 个答案:

没有答案