Question

检查桌子时，我得到了

<table class="wikitable sortable jquery-tablesorter">

所以我在Python中尝试了以下内容：

r = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")
x = bs.BeautifulSoup(r.content)
x.find_all("table",{"class":"wikitable sortable jquery-tablesorter"})

但是，我得到一个空列表。有什么想法吗？

Answer 1

在对网站进行排序之前，导航网站时不会显示表类wikitable sortable jquery-tablesorter。我可以使用表类wikitable sortable来获取一个表。

import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")
soup = BeautifulSoup(res.content, "html.parser")
tables = soup.find_all("table", class_="wikitable sortable")
print(len(tables))

注意：

由于standford.edu tutorial on Beautiful Soup，我在您的示例中使用了class_=而不是字典。
解析器在名为BeautifulSoup的{{1}}类中定义，因此代码适用于打印警告所建议的不同环境。

Answer 2

尝试以下方法。它将从该网站获取表格数据：

import requests
from bs4 import BeautifulSoup

res = requests.get("https://en.wikipedia.org/wiki/Comparison_of_Intel_processors")                                                  
soup = BeautifulSoup(res.text, 'lxml') #if you find any problem with "lxml" then try using "html.parser" instead
table = soup.find("table",class_="wikitable")
for items in table.find_all("tr")[:-1]:
    data = [' '.join(item.text.split()) for item in items.find_all(['th','td'])]
    print(data)

部分输出：

['Processor', 'Series Nomenclature', 'Code Name', 'Production Date', 'Supported Features (Instruction Set)', 'Clock Rate', 'Socket', 'Fabrication', 'TDP', 'Number of Cores', 'Bus Speed', 'L1 Cache', 'L2 Cache', 'L3 Cache', 'Overclock Capable']
['4004', '', '', 'Nov. 15,1971', '', '740 kHz', 'DIP', '10-micron', '', '1 740 kHz', 'N/A', 'N/A', 'N/A']
['8008', 'N/A', 'N/A', 'April 1972', 'N/A', '200 kHz - 800 kHz', 'DIP', '10-micron', '', '1', '200 kHz', 'N/A', 'N/A', 'N/A', '']
['8080', 'N/A', 'N/A', 'April 1974', 'N/A', '2 MHz - 3.125 MHz', 'DIP', '6-micron', '', '1', '2 MHz', 'N/A', 'N/A', 'N/A', '']

BeautifulSoup无法从wiki中提取表格

2 个答案: