每次更新时都想获取表的内容。使用BeautifulSoup。为什么这段代码不起作用?它不会返回任何输出或有时会抛出异常
from bs4 import BeautifulSoup
import urllib2
url = "http://tenders.ongc.co.in/wps/portal/!ut/p/b1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOINLc3MPB1NDLwsPJ1MDTzNPcxMDYJCjA0MzIAKIoEKDHAARwNC-sP1o8BK8Jjg55Gfm6pfkBthoOuoqAgArsFI6g!!/pw/Z7_1966IA40J8IB50I7H650RT30D2/ren/m=view/s=normal/p=struts.portlet.action=QCPtenderHomeQCPlatestTenderListAction/p=struts.portlet.mode=view/=/#Z7_1966IA40J8IB50I7H650RT30D2"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page)
divcontent = soup.find('div', {"id":"latestTrPagging", "class":"content2"})
table = soup.find_all('table')
rows = table.findAll('tr', {"class":"even", "class": "odd"})
for row in rows:
cols = row.findAll('td', {"class":"tno"})
for td in cols:
print td.text(text=True)`
答案 0 :(得分:0)
这对我有用 - 使用requests
代替urllib2
,设置User-Agent
标题并调整一些定位器:
from bs4 import BeautifulSoup
import requests
url = "https://tenders.ongc.co.in/wps/portal/!ut/p/b1/04_Sj9CPykssy0xPLMnMz0vMAfGjzOINLc3MPB1NDLwsPJ1MDTzNPcxMDYJCjA0MzIAKIoEKDHAARwNC-sP1o8BK8Jjg55Gfm6pfkBthoOuoqAgArsFI6g!!/pw/Z7_1966IA40J8IB50I7H650RT30D2/ren/m=view/s=normal/p=struts.portlet.action=QCPtenderHomeQCPlatestTenderListAction/p=struts.portlet.mode=view/=/#Z7_1966IA40J8IB50I7H650RT30D2"
page = requests.get(url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/53.0.2785.116 Safari/537.36"})
soup = BeautifulSoup(page.content, "html.parser")
divcontent = soup.find('div', {"id": "latestTrPagging", "class": "content2"})
table = soup.find('table')
rows = table.find_all('tr', {"class": ["even", "odd"]})
for row in rows:
cols = row.find_all('td', {"class": "tno"})
for td in cols:
print(td.get_text())
打印前10个投标号码:
LC1MC16044[NIT]
LC1MC16043[NIT]
LC1MC16045[NIT]
EY1VC16028[NIT]
RC2SC16050(E -tender)[NIT]
RC2SC16048(E -tender)[NIT]
RC2SC16049(E -tender)[NIT]
UI1MC16002[NIT]
V16RC16015[E-Gas]
K16AC16002[E-Procurement]
请注意您应该如何处理多个课程("甚至"和#34;奇数")。