我正在尝试提取标记为“最活跃”的表中“股票聚焦”部分中的数据。从: https://markets.on.nytimes.com/research/markets/overview/overview.asp
[('General Electric Co', '6.52'),
('Tonix Pharmaceuticals Holding Corp', '1.06'),
('Carnival Corp', '12.00'),
('Uber Technologies Inc', '21.33'),
('American Airlines Group Inc', '10.33'),
('MGM Resorts International', '9.11'),
('Snap Inc', '10.09'),
('Halliburton Co', '5.05')]
import requests
from bs4 import BeautifulSoup
url = 'https://markets.on.nytimes.com/research/markets/overview/overview.asp'
def pull_active(url):
import requests
from bs4 import BeautifulSoup
response = requests.get(url)
results_page = BeautifulSoup(response.content,'lxml')
data = results_page.find_all('table', class_='stock-spotlight-table') # ???
table = data.append(tbody.get_text()) # ??? the html element that contains multiple <tr> elements
table_rows = []
for i in table:
label = i.find('td', class_='truncateMeTo1').text # ?
val = i.find('td', class_='colPrimary' ).text # ?
table_rows.append((Stocks, Latest)) # ??? add the labels and values to the empty list as tuples
return table_rows
pull_active(url)
当我运行上面的代码时,什么都没有发生。我在做什么错了?
答案 0 :(得分:0)
尝试以下代码。基本上read_html会读取页面上的所有表,您可以获得所需的表。
import requests
import pandas as pd
url = 'https://markets.on.nytimes.com/research/markets/overview/overview.asp'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[0]
print(df)