虽然我似乎不是第一个遇到这个问题的人,但我无法找到问题的答案。
我正在抓取一个HTML表格,虽然我试图遍历它,但我只是从表中获取第一行。
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
try:
for items in wegoList:
material = items.find("td", {"class": "click_whole_cell",}).get_text().strip()
cas = items.find("td", {"class": "text-center",}).get_text().strip()
category = items.find("div", {"class": "text-content short-text",}).get_text().strip()
print(material,cas,category)
except:
pass
第一行的结果是正确的:(1,2-二甲基咪唑1739-84-0有机中间体,塑料,树脂和橡胶,涂料); 但是for循环没有循环遍历表。
感谢您的帮助
答案 0 :(得分:0)
for items in wegoList:
循环遍历tbody
列表,然后您尝试从整个表中提取属性,但是您应该遍历每个tr
行:
wegoList = soup.find_all("tbody")
try:
soup=BeautifulSoup(wegoList.__str__(),"html.parser")
trs = soup.find_all('tr') #Makes list of rows
for tr in trs:
material = tr.find("td", {"class": "click_whole_cell",}).get_text().strip()
cas = tr.find("td", {"class": "text-center",}).get_text().strip()
category = tr.find("div", {"class": "text-content short-text",}).get_text().strip()
print(material,cas,category)
答案 1 :(得分:0)
试试这段代码:
imp
更新的代码:
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
try:
for items in wegoList:
material = items.find_all("td", {"class": "click_whole_cell",})
for i in material:
print(i.get_text().strip())
cas = items.find_all("td", {"class": "text-center",})
for i in cas:
print(i.get_text().strip())
category = items.find_all("div", {"class": "text-content short-text",})
for i in category:
print(i.get_text().strip())
except:
pass
输出:
import requests
from bs4 import BeautifulSoup
# Webpage connection
html = "https://www.wegochem.com/chemicals/organic-intermediates/supplier-distributor"
r=requests.get(html)
c=r.content
soup=BeautifulSoup(c,"html.parser")
# Grab title-artist classes and store in recordList
wegoList = soup.find_all("tbody")
for items in wegoList:
material = items.find_all("td", {"class": "click_whole_cell",})
cas = items.find_all("td", {"class": "text-center",})
category = items.find_all("div", {"class": "text-content short-text",})
for i in zip(material,cas,category):
print(i[0].get_text().strip(),i[1].get_text().strip(),i[2].get_text().strip())