Question

我的问题与Scraping all mobiles of Flipkart.com相同。我尝试了那里给出的解决方案，但是start变量的变化不起作用，我只能抓取最初的20个移动信息。

import urllib#.request  # for py 3.x
import re  #regural expression for data manipulation
from bs4 import BeautifulSoup


url="http://www.flipkart.com/mobiles/samsung~brand/pr?sid=tyy%2C4io&start=50"

regex = '<a href=(.+?)>'  # it will find the title
pattern=re.compile(regex)

htmlfile = urllib.urlopen(url)  #//.request is in 3.0x

htmltext= htmlfile.read()

docSoup=BeautifulSoup(htmltext)
abc=docSoup.findAll('a')





title=re.findall(pattern,c)

for i in title:
    print i

start的初始值是21，因此增加到50，但我仍然得到相同的结果。

Answer 1

当网站显示更多内容时，它会发送另一个您必须发送的请求：

http://www.flipkart.com/mobiles/samsung~brand/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&start=61&ajax=true

http://www.flipkart.com/mobiles/samsung~brand/pr?p%5B%5D=sort%3Dfeatured&sid=tyy%2C4io&start=81&ajax=true

我发现使用httpfox但你也可以使用chrome网络来实现它

请注意，第一个请求包含start=61，第二个请求包含start=81

顺便说一句，我个人使用requests而不是urllib

Answer 2

页面有4个ajax请求，检查屏幕截图，尝试编写动态更改每个请求中的开始的代码，使用try catch来处理异常处理 image page

在从flipkart中删除移动设备详细信息时“显示更多结果”

2 个答案: