我有df1
:
GOOG AAPL XOM IBM _CASH
2011-01-13 16:00:00 0 0 0 0 0
2011-01-14 16:00:00 0 0 0 0 0
2011-01-18 16:00:00 0 0 0 0 0
2011-01-19 16:00:00 0 0 0 0 0
2011-01-20 16:00:00 0 0 0 0 0
...
和df2
:
year month day symbol order_type shares trans_date
0 2011 1 13 AAPL Sell 1500 2011-01-13
1 2011 1 13 IBM Buy 4000 2011-01-13
2 2011 1 26 GOOG Buy 1000 2011-01-26
3 2011 2 2 XOM Sell 4000 2011-02-02
4 2011 2 10 XOM Buy 4000 2011-02-10
5 2011 3 3 GOOG Sell 1000 2011-03-03
6 2011 3 3 IBM Sell 2200 2011-03-03
7 2011 6 3 IBM Sell 3300 2011-06-03
8 2011 5 3 IBM Buy 1500 2011-05-03
9 2011 6 10 AAPL Buy 1200 2011-06-10
10 2011 8 1 GOOG Buy 55 2011-08-01
11 2011 8 1 GOOG Sell 55 2011-08-01
12 2011 12 20 AAPL Sell 1200 2011-12-20
我想插入df2
'股票'值匹配到匹配的df1
日期行和df1
匹配符号列。分享'的标志。对于“购买”而言价值将是积极的。 ' ORDER_TYPE'对于' Sell'
例如,对于df2
的第一行,-1500的值将被插入' AAPL' df1
的列,df2
上匹配的行' trans_date'和df1
时间戳。
df2
' trans_date'并且df1
时间戳已编入索引。
我很感谢你的帮助。
我使用这个堆叠df1并与df2合并:
pd.merge(df_stacked_mx,df_orders,left_index = True,right_index = True,how =' inner')
我明白了:
level_0 level_1 0 year month day symbol order_type shares \
0 2011-01-13 16:00:00 GOOG 0 2011 1 13 AAPL Sell 1500
1 2011-01-13 16:00:00 AAPL 0 2011 1 13 IBM Buy 4000
2 2011-01-13 16:00:00 XOM 0 2011 1 26 GOOG Buy 1000
3 2011-01-13 16:00:00 IBM 0 2011 2 2 XOM Sell 4000
4 2011-01-13 16:00:00 _CASH 0 2011 2 10 XOM Buy 4000
5 2011-01-14 16:00:00 GOOG 0 2011 3 3 GOOG Sell 1000
6 2011-01-14 16:00:00 AAPL 0 2011 3 3 IBM Sell 2200
7 2011-01-14 16:00:00 XOM 0 2011 6 3 IBM Sell 3300
8 2011-01-14 16:00:00 IBM 0 2011 5 3 IBM Buy 1500
9 2011-01-14 16:00:00 _CASH 0 2011 6 10 AAPL Buy 1200
10 2011-01-18 16:00:00 GOOG 0 2011 8 1 GOOG Buy 55
11 2011-01-18 16:00:00 AAPL 0 2011 8 1 GOOG Sell 55
12 2011-01-18 16:00:00 XOM 0 2011 12 20 AAPL Sell 1200
trans_date
0 2011-01-13 16:00:00
1 2011-01-13 16:00:00
2 2011-01-26 16:00:00
3 2011-02-02 16:00:00
4 2011-02-10 16:00:00
5 2011-03-03 16:00:00
6 2011-03-03 16:00:00
7 2011-06-03 16:00:00
8 2011-05-03 16:00:00
9 2011-06-10 16:00:00
10 2011-08-01 16:00:00
11 2011-08-01 16:00:00
12 2011-12-20 16:00:00
我不知道下次去哪里,我需要帮助。
德鲁
感谢Bob为您的GitHub贡献。非常优雅,没有循环。我需要一个年龄才能得到答案。
有一个问题。堆叠的df1更新返回"无"。指数不匹配:
我的堆叠df2看起来像这样:
trans_date symbol
2011-01-13 16:00:00 AAPL -1500
IBM 4000
2011-01-26 16:00:00 GOOG 1000
2011-02-02 16:00:00 XOM -4000
2011-02-10 16:00:00 XOM 4000
2011-03-03 16:00:00 GOOG -1000
IBM -2200
2011-05-03 16:00:00 IBM 1500
2011-06-03 16:00:00 IBM -3300
2011-06-10 16:00:00 AAPL 1200
2011-08-01 16:00:00 GOOG 0
2011-12-20 16:00:00 AAPL -1200
你的是:
trans_date symbol
2011-01-13 AAPL -1500
IBM 4000
2011-01-26 GOOG 1000
2011-02-02 XOM -4000
2011-02-10 XOM 4000
2011-03-03 GOOG -1000
IBM -2200
2011-05-03 IBM 1500
2011-06-03 IBM -3300
2011-06-10 AAPL 1200
2011-08-01 GOOG 0
2011-12-20 AAPL -1200
Name: shares, dtype: int64
请注意我的约会时间。
但我的df1没有时间:
GOOG AAPL XOM IBM _CASH
2011-01-13 0 0 0 0 0
2011-01-26 0 0 0 0 0
2011-02-02 0 0 0 0 0
2011-02-10 0 0 0 0 0
2011-03-03 0 0 0 0 0
2011-05-03 0 0 0 0 0
2011-06-03 0 0 0 0 0
2011-06-10 0 0 0 0 0
2011-08-01 0 0 0 0 0
2011-12-20 0 0 0 0 0
这令人困惑。如果我使用开始和结束之间的所有时间戳创建一个df,我会得到时间戳:
GOOG AAPL XOM IBM _CASH
2011-01-13 16:00:00 0 0 0 0 0
2011-01-14 16:00:00 0 0 0 0 0
2011-01-18 16:00:00 0 0 0 0 0
2011-01-19 16:00:00 0 0 0 0 0
2011-01-20 16:00:00 0 0 0 0 0
2011-01-21 16:00:00 0 0 0 0 0
2011-01-24 16:00:00 0 0 0 0 0
2011-01-25 16:00:00 0 0 0 0 0
2011-01-26 16:00:00 0 0 0 0 0
2011-01-27 16:00:00 0 0 0 0 0
2011-01-28 16:00:00 0 0 0 0 0
2011-01-31 16:00:00 0 0 0 0 0
2011-02-01 16:00:00 0 0 0 0 0
2011-02-02 16:00:00 0 0 0 0 0
2011-02-03 16:00:00 0 0 0 0 0
2011-02-04 16:00:00 0 0 0 0 0
2011-02-07 16:00:00 0 0 0 0 0
2011-02-08 16:00:00 0 0 0 0 0
2011-02-09 16:00:00 0 0 0 0 0
2011-02-10 16:00:00 0 0 0 0 0
2011-02-11 16:00:00 0 0 0 0 0
2011-02-14 16:00:00 0 0 0 0 0
2011-02-15 16:00:00 0 0 0 0 0
2011-02-16 16:00:00 0 0 0 0 0
2011-02-17 16:00:00 0 0 0 0 0
2011-02-18 16:00:00 0 0 0 0 0
2011-02-22 16:00:00 0 0 0 0 0
2011-02-23 16:00:00 0 0 0 0 0
2011-02-24 16:00:00 0 0 0 0 0
2011-02-25 16:00:00 0 0 0 0 0
... ... ... ... ...
但是如果我用df2中的日期创建一个df,那么时间组件会被删除????:
我正处于日期时间地狱,你可能会告诉鲍勃。
一个明显的解决方案是使两个索引都是日期时间格式,但我不知道如何为df1文件执行此操作。另请注意,在df2索引中删除时间组件不是一种选择。此df必须最终与包含开始/结束之间的所有日期时间的较大df合并,即具有日期中的时间的日期时间。
万分感谢你代表我的努力鲍勃。