网页搜索,使用机械化获取整个页面

时间:2016-03-17 00:01:48

标签: python-3.x web-scraping beautifulsoup mechanize

我的目标是从页面获取所有项目。我只得到了25个中的前10个。我认为这与桌子有关,我认为它是某种类型的小部件?我是初学者,还在学习基础知识。

import mechanize,time
from bs4 import BeautifulSoup

br = mechanize.Browser()  
br.set_handle_robots(False)  
br.addheaders = [("User-agent", "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13")]  

sign_in = br.open('https://sellercentral.amazon.com/gp/homepage.html?')  

br.select_form(name="signinWidget")  
br["username"] = 'spam' 
br["password"] = 'eggs'
logged_in = br.submit() 

orders_html = br.open("https://sellercentral.amazon.com/hz/inventory/ref=ag_invmgr_dnav_xx_?tbla_myitable=sort:{%22sortOrder%22%3A%22DESCENDING%22%2C%22sortedColumnId%22%3A%22date%22};search:;pagination:1;")

print('Login complete...')
time.sleep(5)

soup = BeautifulSoup(orders_html,'html.parser')
partNums = soup.find_all('span', {'class': 'mt-text-content mt-table-main'})

 print(partNums)

 for part in partNums:
  print(part.text)  

 print('Process Complete.')

1 个答案:

答案 0 :(得分:0)

您可以按br.response().read()

获取网页的HTML
orders_page = br.open("https://sellercentral.amazon.com/hz/inventory/ref=ag_invmgr_dnav_xx_?tbla_myitable=sort:{%22sortOrder%22%3A%22DESCENDING%22%2C%22sortedColumnId%22%3A%22date%22};search:;pagination:1;")  # loads page
orders_html = br.response().read()  # saves page source