我确信这个问题很简单,但是它已经让我感觉太久了,所以真的会感激一些方向
我希望根据另外两列
的结果向数据框添加一列我想确定股票是否等于前一行的股票,日期是否等于前一行的日期。
我希望获得运行计数 我尝试了下面的内容
df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() , 1, 0)
和
df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() & df['trade_date']==df['trade_date'].shift(),1,0)
示例输入
Stock, Date, Time, Price
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92
输出:
Stock, Date,Time, Price, DayCount
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2
我遇到了像
这样的错误TypeError: unsupported operand type(s) for &: 'str' and 'bool'
然后应用cumulative count
。
首先,这对我来说最重要的是,你如何编写初始语句,以便你可以对多列进行比较
其次,您将如何添加cumulative count
?
非常感谢您的帮助
扩展原帖,这是另一个问题。现在假设数据集略有不同
Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer
我们希望看到有多少次股票在买入或报价上交易,因此产出将是:
Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2
分组实际上是股票和日期,时间仅用于确定序列..任何有助于此扩展的帮助
答案 0 :(得分:1)
UPDATE3: “而且我们希望看到竞价或优惠中连续多少次交易”
In [112]: g = df.groupby(['Stock','Date'])
In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1
In [114]: df
Out[114]:
Stock Date Time Price BidOffer Count
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 1
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
<强> UPDATE2:强>
In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1
In [516]: df
Out[516]:
Stock Date Time Price BidOffer DayCount
0 IBM 2014-09-01 12:30:01 50.50 bid 1
1 IBM 2014-09-01 12:30:02 50.70 offer 1
2 IBM 2014-09-01 12:30:03 50.90 bid 2
3 IBM 2014-09-02 09:57:02 52.70 bid 1
4 IBM 2014-09-02 09:57:03 52.90 bid 2
5 AAPL 2014-11-02 09:57:02 520.31 offer 1
6 AAPL 2014-11-02 09:57:03 520.92 offer 2
<强>更新强>
In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1
In [490]: df
Out[490]:
Stock Datetime Price DayCount
0 IBM 2014-09-01 12:30:01 50.50 1
1 IBM 2014-09-01 12:30:02 50.70 2
2 IBM 2014-09-01 12:30:03 50.90 3
3 IBM 2014-09-02 09:57:02 52.70 1
4 IBM 2014-09-02 09:57:03 52.90 2
5 AAPL 2014-11-02 09:57:02 520.31 1
6 AAPL 2014-11-02 09:57:03 520.92 2
回答原始问题:
df['DayCount']=np.where(
(df['ticker']==df['ticker'].shift())
&
(df['trade_date']==df['trade_date'].shift()),
1,
0
)
第二个解决方案中唯一缺少的是括号:np.where( (...) & (...), 1, 0)