以下Python模块检查flipkart.com上是否存在指定的项目:
import sys
import bs4
import re
import urllib2
def findItem(itemName):
itemName.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query= {0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(itemName)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
response = urllib2.urlopen(r)
except:
print "Internet connection error"
return
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
firstBlockSoup = soup.find('div', attrs={'class': 'size1of4 fk-medium-atom unit'})
if not firstBlockSoup:
print "Item Not Found"
return
else:
print "Item found"
return
上述模块适用于flipkart.com上的部分产品,但不适用于所有产品
例如,它适用于:
findItem("galaxy s advance")
但不适用于:
findItem("Giordano Analog Watch")
如果您在flipkart.com上查看上述两种产品的页面源代码(更好地使用“Inspect element”)并将其与代码相关联,那么原因就很明显了。
任何人都可以建议一个完成任务的万无一失的方法吗?
答案 0 :(得分:2)
如果你把它分成两个支票怎么办:
import urllib2
import bs4
def findItem(itemName):
itemName.replace(" ", "+")
link = 'http://www.flipkart.com/search/a/all?query= {0}&vertical=all&dd=0&autosuggest[as]=off&autosuggest[as-submittype]=entered&autosuggest[as-grouprank]=0&autosuggest[as-overallrank]=0&autosuggest[orig-query]=&autosuggest[as-shown]=off&Search=%C2%A0&otracker=start&_r=YSWdYULYzr4VBYklfpZRbw--&_l=pMHn9vNCOBi05LKC_PwHFQ--&ref=a2c6fadc-2e24-4412-be6a-ce02c9707310&selmitem=All+Categories'.format(
itemName)
r = urllib2.Request(link, headers={"User-Agent": "Python-urlli~"})
try:
response = urllib2.urlopen(r)
except:
print "Internet connection error"
return
thePage = response.read()
soup = bs4.BeautifulSoup(thePage)
firstBlockSoup = soup.find('div', attrs={'class': 'product-unit'})
if not firstBlockSoup:
firstBlockSoup = soup.find('div', attrs={'class': 'size1of4 fk-medium-atom unit'})
if not firstBlockSoup:
print "Item Not Found"
return
print "Item found"
return
findItem("galaxy s advance")
findItem("Giordano Analog Watch")
findItem("nosuchitemfound")
打印:
Item found
Item found
Item Not Found
另一种方法是检查是否存在“无结果页面”。例如,只需检查"0 results found" in soup.text
。