对于为什么我的代码返回“ Cannabis Stocks”(位于具有class = cwl-performance的表下),我有些困惑。我正在尝试使用class = cwl-symbols从表中收集股票代码名称。
从我的代码中可以看到,我指定了class = cwl-symbols。我不明白为什么我要从具有class = cwl-performance的表中得到结果。
import bs4 as bs
import pickle
import requests
def cannabisTickers():
resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/')
soup = bs.BeautifulSoup(resp.txt, 'lxml')
table = soup.findAll('table', {'class' : 'cwl-symbols'})
tickers = []
for row in table.findAll('tr'):
ticker = row.findAll('td').text
tickers.append(ticker)
print(tickers)
我的结果是“ title>大麻股票” title>”,它来自错误的表格。
答案 0 :(得分:2)
使用pandas
库并在标头中设置User-Agent:
熊猫tolist()
方法用于将系列转换为列表。
import requests
import pandas as pd
def cannabisTickers():
resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/',headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'
})
table = pd.read_html(resp.content)[1]
print(table['Symbol'].tolist())
if __name__ == '__main__':
cannabisTickers()
O / P:
['BUD', 'ABBV', 'MO', 'WEED.TO', 'TAP', 'CGC', 'ACB', 'SMG', 'GWPH', 'CRON', 'TLRY', 'TGOD.TO', 'TGODF', 'TRST.TO', 'CRBP', 'HYG.TO', 'CTST', 'NBEV', 'TRTC', 'CANN', 'MJ']
答案 1 :(得分:0)
在获得正确的HTML响应之前,我必须设置一个User-Agent
标头。然后,只需提取正确的数据即可。
import requests
from bs4 import BeautifulSoup
if __name__ == '__main__':
resp = requests.get('https://finance.yahoo.com/u/yahoo-finance/watchlists/420_stocks/',
headers={
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0'
})
assert '<table class="cwl-symbols' in resp.text
soup = BeautifulSoup(resp.text, 'html.parser')
table = soup.select_one('.cwl-symbols')
tickers = []
for row in table.select('tr'):
ticker = [cell.text for cell in row.select('td')]
if ticker:
tickers.append(ticker)
print(tickers)
打印:
[['BUD',
'Anheuser-Busch InBev SA/NV',
'88.43',
'-0.08',
'-0.09%',
'4:02 PM EDT',
'1.02M',
'1.11M',
'173.00B'],
['ABBV',
'AbbVie Inc.',
'73.4',
'+0.68',
'+0.94%',
'4:00 PM EDT',
'17.91M',
'7.72M',
'108.51B'],
['MO',
'Altria Group, Inc.',
'47.69',
'+0.34',
'+0.72%',
'4:02 PM EDT',
'7.16M',
'8.18M',
'89.22B'],
['WEED.TO',
'Canopy Growth Corporation',
'52.87',
'-0.49',
'-0.92%',
'3:59 PM EDT',
'1.74M',
'2.16M',
'18.29B'],
['CGC',
'Canopy Growth Corporation',
'40.59',
'+0.28',
'+0.69%',
'4:00 PM EDT',
'1.72M',
'4.88M',
'14.06B'],
['TAP',
'Molson Coors Brewing Company',
'56.31',
'+0.31',
'+0.55%',
'4:02 PM EDT',
'1.24M',
'1.59M',
'12.18B'],
['ACB',
'Aurora Cannabis Inc.',
'7.83',
'+0.01',
'+0.13%',
'4:01 PM EDT',
'6.12M',
'14.84M',
'7.94B'],
['SMG',
'The Scotts Miracle-Gro Company',
'98.98',
'+0.48',
'+0.49%',
'4:02 PM EDT',
'386.87k',
'434.24k',
'5.49B'],
['CRON',
'Cronos Group Inc.',
'16.02',
'+0.04',
'+0.25%',
'4:00 PM EDT',
'3.28M',
'5.35M',
'5.37B'],
['GWPH',
'GW Pharmaceuticals plc',
'172.64',
'+0.25',
'+0.15%',
'4:00 PM EDT',
'284.32k',
'387.01k',
'5.16B'],
['TLRY',
'Tilray, Inc.',
'49.3',
'+2.74',
'+5.88%',
'4:00 PM EDT',
'1.89M',
'1.71M',
'4.79B'],
['TRST.TO',
'CannTrust Holdings Inc.',
'6.56',
'+0.01',
'+0.15%',
'4:00 PM EDT',
'698.69k',
'1.71M',
'926.17M'],
['TGOD.TO',
'The Green Organic Dutchman Holdings Ltd.',
'3.23',
'+0.05',
'+1.57%',
'4:00 PM EDT',
'534.93k',
'2.03M',
'886.22M'],
['CTST',
'CannTrust Holdings Inc.',
'5.12',
'+0.10',
'+1.99%',
'4:02 PM EDT',
'1.52M',
'2.89M',
'721.33M'],
['TGODF',
'The Green Organic Dutchman Holdings Ltd.',
'2.52',
'+0.04',
'+1.61%',
'3:59 PM EDT',
'410.14k',
'655.39k',
'676.91M'],
['CRBP',
'Corbus Pharmaceuticals Holdings, Inc.',
'7.08',
'+0.15',
'+2.16%',
'4:00 PM EDT',
'631.39k',
'999.85k',
'456.34M'],
['HYG.TO',
'Hydrogenics Corporation',
'19.47',
'-0.84',
'-4.14%',
'3:52 PM EDT',
'57.50k',
'9.21k',
'369.68M'],
['NBEV',
'New Age Beverages Corporation',
'4.75',
'+0.09',
'+1.93%',
'4:00 PM EDT',
'2.50M',
'5.13M',
'365.03M'],
['TRTC',
'Terra Tech Corp.',
'0.58725',
'+0.01',
'+1.95%',
'3:50 PM EDT',
'371.50k',
'795.50k',
'60.59M'],
['CANN',
'General Cannabis Corp',
'0.76',
'-0.05',
'-5.59%',
'3:59 PM EDT',
'233.49k',
'221.00k',
'27.55M'],
['MJ',
'ETFMG Alternative Harvest ETF',
'32.1',
'+0.42',
'+1.33%',
'4:00 PM EDT',
'244.55k',
'514.25k',
'-']]