大熊猫 - 基于其他两列移位值的条件计算

时间:2016-06-11 14:15:14

标签: python numpy pandas dataframe

我确信这个问题很简单,但是它已经让我感觉太久了,所以真的会感激一些方向

我希望根据另外两列

的结果向数据框添加一列

我想确定股票是否等于前一行的股票,日期是否等于前一行的日期。

我希望获得运行计数 我尝试了下面的内容

df['DayCount']=np.where(df['ticker'] ==df['ticker'].shift()) & np.where(df['trade_date']==df['trade_date'].shift() ,  1, 0)

df['DayCount'] = df.where(df['ticker'] ==df['ticker'].shift() &    df['trade_date']==df['trade_date'].shift(),1,0)

示例输入

Stock, Date, Time, Price 
IBM, 2014-09-01, 12:30:01, 50.5
IBM, 2014-09-01, 12:30:02, 50.7
IBM, 2014-09-01, 12:30:03, 50.9
IBM, 2014-09-02, 09:57:02, 52.7
IBM, 2014-09-02, 09:57:03, 52.9
AAPL, 2014-11-02, 09:57:02, 520.31
AAPL, 2014-11-02, 09:57:03, 520.92

输出:

Stock, Date,Time, Price, DayCount 
IBM, 2014-09-01, 12:30:01, 50.5,1
IBM, 2014-09-01, 12:30:02, 50.7,2
IBM, 2014-09-01, 12:30:03, 50.9,3
IBM, 2014-09-02, 09:57:02, 52.7,1
IBM, 2014-09-02, 09:57:03, 52.9,2
AAPL, 2014-11-02, 09:57:02, 520.31,1
AAPL, 2014-11-02, 09:57:03, 520.92,2

我遇到了像

这样的错误
TypeError: unsupported operand type(s) for &: 'str' and 'bool'

然后应用cumulative count

首先,这对我来说最重要的是,你如何编写初始语句,以便你可以对多列进行比较

其次,您将如何添加cumulative count

非常感谢您的帮助

扩展原帖,这是另一个问题。现在假设数据集略有不同

Stock, Date, Time, Price,BidOffer
IBM, 2014-09-01, 12:30:01, 50.5, bid
IBM, 2014-09-01, 12:30:02, 50.7, offer
IBM, 2014-09-01, 12:30:03, 50.9, bid
IBM, 2014-09-02, 09:57:02, 52.7, bid
IBM, 2014-09-02, 09:57:03, 52.9, bid
AAPL, 2014-11-02, 09:57:02, 520.31, offer
AAPL, 2014-11-02, 09:57:03, 520.92, offer

我们希望看到有多少次股票在买入或报价上交易,因此产出将是:

Stock, Date, Time, Price,BidOffer,Count
IBM, 2014-09-01, 12:30:01, 50.5, bid, 1 
IBM, 2014-09-01, 12:30:02, 50.7, offer, 1
IBM, 2014-09-01, 12:30:03, 50.9, bid,1
IBM, 2014-09-02, 09:57:02, 52.7, bid,1
IBM, 2014-09-02, 09:57:03, 52.9, bid,2
AAPL, 2014-11-02, 09:57:02, 520.31, offer,1
AAPL, 2014-11-02, 09:57:03, 520.92, offer,2

分组实际上是股票和日期,时间仅用于确定序列..任何有助于此扩展的帮助

1 个答案:

答案 0 :(得分:1)

UPDATE3: “而且我们希望看到竞价或优惠中连续多少次交易”

In [112]: g = df.groupby(['Stock','Date'])

In [113]: df['Count'] = g['BidOffer'].apply(lambda x: (x == x.shift()).cumsum()) + 1

In [114]: df
Out[114]:
  Stock       Date      Time   Price BidOffer  Count
0   IBM 2014-09-01  12:30:01   50.50      bid      1
1   IBM 2014-09-01  12:30:02   50.70    offer      1
2   IBM 2014-09-01  12:30:03   50.90      bid      1
3   IBM 2014-09-02  09:57:02   52.70      bid      1
4   IBM 2014-09-02  09:57:03   52.90      bid      2
5  AAPL 2014-11-02  09:57:02  520.31    offer      1
6  AAPL 2014-11-02  09:57:03  520.92    offer      2

<强> UPDATE2:

In [515]: df['DayCount'] = df.groupby(['Stock', 'Date', 'BidOffer']).cumcount() + 1

In [516]: df
Out[516]:
  Stock       Date      Time   Price BidOffer  DayCount
0   IBM 2014-09-01  12:30:01   50.50      bid         1
1   IBM 2014-09-01  12:30:02   50.70    offer         1
2   IBM 2014-09-01  12:30:03   50.90      bid         2
3   IBM 2014-09-02  09:57:02   52.70      bid         1
4   IBM 2014-09-02  09:57:03   52.90      bid         2
5  AAPL 2014-11-02  09:57:02  520.31    offer         1
6  AAPL 2014-11-02  09:57:03  520.92    offer         2

<强>更新

In [489]: df['DayCount'] = df.groupby(['Stock', df.Datetime.dt.date]).cumcount() + 1

In [490]: df
Out[490]:
  Stock            Datetime   Price  DayCount
0   IBM 2014-09-01 12:30:01   50.50         1
1   IBM 2014-09-01 12:30:02   50.70         2
2   IBM 2014-09-01 12:30:03   50.90         3
3   IBM 2014-09-02 09:57:02   52.70         1
4   IBM 2014-09-02 09:57:03   52.90         2
5  AAPL 2014-11-02 09:57:02  520.31         1
6  AAPL 2014-11-02 09:57:03  520.92         2

回答原始问题:

df['DayCount']=np.where(
                  (df['ticker']==df['ticker'].shift())
                  &
                  (df['trade_date']==df['trade_date'].shift()),
                  1,
                  0
)

第二个解决方案中唯一缺少的是括号:np.where( (...) & (...), 1, 0)