程序解析错误的表

时间:2019-07-02 03:52:30

标签: python beautifulsoup

对于为什么我的代码返回“ Cannabis Stocks”(位于具有class = cwl-performance的表下),我有些困惑。我正在尝试使用class = cwl-symbols从表中收集股票代码名称。

从我的代码中可以看到,我指定了class = cwl-symbols。我不明白为什么我要从具有class = cwl-performance的表中得到结果。

import bs4 as bs
import pickle
import requests


def cannabisTickers():
    resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/')
    soup = bs.BeautifulSoup(resp.txt, 'lxml')
    table = soup.findAll('table', {'class' : 'cwl-symbols'})
    tickers = []
    for row in table.findAll('tr'):
        ticker = row.findAll('td').text
        tickers.append(ticker)
    print(tickers)

我的结果是“ title>大麻股票” title>”,它来自错误的表格。

2 个答案:

答案 0 :(得分:2)

使用pandas库并在标头中设置User-Agent: 熊猫tolist()方法用于将系列转换为列表。

import requests
import pandas as pd

def cannabisTickers():
    resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/',headers={
                            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'
                        })
    table = pd.read_html(resp.content)[1]
    print(table['Symbol'].tolist())

if __name__ == '__main__':
    cannabisTickers()

O / P:

['BUD', 'ABBV', 'MO', 'WEED.TO', 'TAP', 'CGC', 'ACB', 'SMG', 'GWPH', 'CRON', 'TLRY', 'TGOD.TO', 'TGODF', 'TRST.TO', 'CRBP', 'HYG.TO', 'CTST', 'NBEV', 'TRTC', 'CANN', 'MJ']

答案 1 :(得分:0)

在获得正确的HTML响应之前,我必须设置一个User-Agent标头。然后,只需提取正确的数据即可。

import requests
from bs4 import BeautifulSoup

if __name__ == '__main__':
    resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/',
                        headers={
                            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'
                        })
    assert '<table class="cwl-symbols' in resp.text
    soup = BeautifulSoup(resp.text, 'html.parser')
    table = soup.select_one('.cwl-symbols')
    tickers = []
    for row in table.select('tr'):
        ticker = [cell.text for cell in row.select('td')]
        if ticker:
            tickers.append(ticker)
    print(tickers)

打印:

[['BUD',
  'Anheuser-Busch InBev SA/NV',
  '88.43',
  '-0.08',
  '-0.09%',
  '4:02 PM EDT',
  '1.02M',
  '1.11M',
  '173.00B'],
 ['ABBV',
  'AbbVie Inc.',
  '73.4',
  '+0.68',
  '+0.94%',
  '4:00 PM EDT',
  '17.91M',
  '7.72M',
  '108.51B'],
 ['MO',
  'Altria Group, Inc.',
  '47.69',
  '+0.34',
  '+0.72%',
  '4:02 PM EDT',
  '7.16M',
  '8.18M',
  '89.22B'],
 ['WEED.TO',
  'Canopy Growth Corporation',
  '52.87',
  '-0.49',
  '-0.92%',
  '3:59 PM EDT',
  '1.74M',
  '2.16M',
  '18.29B'],
 ['CGC',
  'Canopy Growth Corporation',
  '40.59',
  '+0.28',
  '+0.69%',
  '4:00 PM EDT',
  '1.72M',
  '4.88M',
  '14.06B'],
 ['TAP',
  'Molson Coors Brewing Company',
  '56.31',
  '+0.31',
  '+0.55%',
  '4:02 PM EDT',
  '1.24M',
  '1.59M',
  '12.18B'],
 ['ACB',
  'Aurora Cannabis Inc.',
  '7.83',
  '+0.01',
  '+0.13%',
  '4:01 PM EDT',
  '6.12M',
  '14.84M',
  '7.94B'],
 ['SMG',
  'The Scotts Miracle-Gro Company',
  '98.98',
  '+0.48',
  '+0.49%',
  '4:02 PM EDT',
  '386.87k',
  '434.24k',
  '5.49B'],
 ['CRON',
  'Cronos Group Inc.',
  '16.02',
  '+0.04',
  '+0.25%',
  '4:00 PM EDT',
  '3.28M',
  '5.35M',
  '5.37B'],
 ['GWPH',
  'GW Pharmaceuticals plc',
  '172.64',
  '+0.25',
  '+0.15%',
  '4:00 PM EDT',
  '284.32k',
  '387.01k',
  '5.16B'],
 ['TLRY',
  'Tilray, Inc.',
  '49.3',
  '+2.74',
  '+5.88%',
  '4:00 PM EDT',
  '1.89M',
  '1.71M',
  '4.79B'],
 ['TRST.TO',
  'CannTrust Holdings Inc.',
  '6.56',
  '+0.01',
  '+0.15%',
  '4:00 PM EDT',
  '698.69k',
  '1.71M',
  '926.17M'],
 ['TGOD.TO',
  'The Green Organic Dutchman Holdings Ltd.',
  '3.23',
  '+0.05',
  '+1.57%',
  '4:00 PM EDT',
  '534.93k',
  '2.03M',
  '886.22M'],
 ['CTST',
  'CannTrust Holdings Inc.',
  '5.12',
  '+0.10',
  '+1.99%',
  '4:02 PM EDT',
  '1.52M',
  '2.89M',
  '721.33M'],
 ['TGODF',
  'The Green Organic Dutchman Holdings Ltd.',
  '2.52',
  '+0.04',
  '+1.61%',
  '3:59 PM EDT',
  '410.14k',
  '655.39k',
  '676.91M'],
 ['CRBP',
  'Corbus Pharmaceuticals Holdings, Inc.',
  '7.08',
  '+0.15',
  '+2.16%',
  '4:00 PM EDT',
  '631.39k',
  '999.85k',
  '456.34M'],
 ['HYG.TO',
  'Hydrogenics Corporation',
  '19.47',
  '-0.84',
  '-4.14%',
  '3:52 PM EDT',
  '57.50k',
  '9.21k',
  '369.68M'],
 ['NBEV',
  'New Age Beverages Corporation',
  '4.75',
  '+0.09',
  '+1.93%',
  '4:00 PM EDT',
  '2.50M',
  '5.13M',
  '365.03M'],
 ['TRTC',
  'Terra Tech Corp.',
  '0.58725',
  '+0.01',
  '+1.95%',
  '3:50 PM EDT',
  '371.50k',
  '795.50k',
  '60.59M'],
 ['CANN',
  'General Cannabis Corp',
  '0.76',
  '-0.05',
  '-5.59%',
  '3:59 PM EDT',
  '233.49k',
  '221.00k',
  '27.55M'],
 ['MJ',
  'ETFMG Alternative Harvest ETF',
  '32.1',
  '+0.42',
  '+1.33%',
  '4:00 PM EDT',
  '244.55k',
  '514.25k',
  '-']]