Question

假设我有一组库存事件存储为数据框，看起来像

  date     ticker price
0 2017-1-2 'AAPL' 130.00
1.2017-1-2 'ZNGA' 2.82

（等）

我想只选择与S＆amp; P500中的股票相对应的那些行。显而易见的方法是创建一个字典sp500dict，其键是S＆amp; P500名称，然后执行df[df['ticker'] in sp500dict]之类的操作。但是，这个（以及我尝试过的其他一些方案）都失败了，在这种情况下如下：

TypeError: 'Series' objects are mutable, thus they cannot be hashed

有什么建议吗？有一个可怕的kludge创建一个数据框，其行包含字典中的元素，然后进行连接，但这似乎有点极端。

Answer 1

试试这个：

url = 'https://en.wikipedia.org/wiki/List_of_S%26P_500_companies'
sp500 = pd.read_html(url)[0].iloc[1:, 0].str.replace('\.', '-')


In [66]: df[df['ticker'].isin(sp500)]
Out[66]:
       date ticker  price
0  2017-1-2   AAPL  130.0

时间表示200.000行DF：

In [102]: df = pd.concat([df] * 10**5, ignore_index=True)

In [103]: df.shape
Out[103]: (200000, 3)

In [104]: s = sp500.to_frame('ticker')

In [105]: %timeit df[df['ticker'].isin(sp500)]
10 loops, best of 3: 42.4 ms per loop

In [106]: %timeit pd.merge(df, s)
10 loops, best of 3: 50.2 ms per loop

根据字典成员资格选择pandas数据框中的行

1 个答案: