Web抓取漂亮的汤在-Python

时间：2020-03-25 07:27:19

标签： python web-scraping

我正在尝试提取标记为“最活跃”的表中“股票聚焦”部分中的数据。从： https://markets.on.nytimes.com/research/markets/overview/overview.asp

并打印如下内容：

[('General Electric Co', '6.52'),
 ('Tonix Pharmaceuticals Holding Corp', '1.06'),
 ('Carnival Corp', '12.00'),
 ('Uber Technologies Inc', '21.33'),
 ('American Airlines Group Inc', '10.33'),
 ('MGM Resorts International', '9.11'),
 ('Snap Inc', '10.09'),
 ('Halliburton Co', '5.05')]

我的代码

import requests
from bs4 import BeautifulSoup

url = 'https://markets.on.nytimes.com/research/markets/overview/overview.asp'

def pull_active(url):

    import requests
    from bs4 import BeautifulSoup

    response     = requests.get(url)
    results_page = BeautifulSoup(response.content,'lxml')
    data         = results_page.find_all('table', class_='stock-spotlight-table') # ???  
    table        = data.append(tbody.get_text()) # ??? the html element that contains multiple <tr> elements 

    table_rows   = []
    for i in table:
        label    = i.find('td', class_='truncateMeTo1').text # ?
        val      = i.find('td', class_='colPrimary'   ).text # ?
        table_rows.append((Stocks, Latest))             # ??? add the labels and values to the empty list as tuples 
    return table_rows

pull_active(url)

当我运行上面的代码时，什么都没有发生。我在做什么错了？

1 个答案:

答案 0 :(得分：0)

尝试以下代码。基本上read_html会读取页面上的所有表，您可以获得所需的表。

import requests
import pandas as pd
url = 'https://markets.on.nytimes.com/research/markets/overview/overview.asp'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print(df)