我对python和beautifulSoup也很新。我正在从ryan mtichell书中搜索网页。 我抓的网站是http://www.pythonscraping.com/pages/page3.html
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
bs0bj = BeautifulSoup(html, "html.parser")
for i in bs0bj.find_all(id="gift1"):
print(i.get_text())
#for i in bs0bj.find_all("tr", {"class":"gift"}):
# print(i)
# for c in bs0bj.find_all("img", {"src":re.compile(\.\.\/img\/gifts/img.*\.jpg)}):
# print(c.image["src"])
我的问题是我想废弃1行礼品项目标题(“项目,描述,费用,图片)以及图像名称,如... img / gift.jpg但是直到我无法做som可以某人请帮我写出正确的代码
并且请解释这些代码以便我也能理解它......没有标签
答案 0 :(得分:1)
这是你在找什么?
eb ssh
答案 1 :(得分:0)
这是代码
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen("http://www.pythonscraping.com/pages/page3.html")
soup = BeautifulSoup(html, "html.parser")
my_table =soup.find_all("table",id="giftList")
my_table =my_table[0]
rows = my_table.findChildren(['th', 'tr'])
for row in rows:
cells = row.findChildren('td')
for cell in cells:
value = cell.string
print ("The value in this cell is %s" % value)
网上有很多帮助,您可以查看。