我如何用BeautifulSoup刮桌子

时间:2018-12-31 03:17:24

标签: python

我尝试使用BeautifulSoup从事务处理历史记录中的aliexpress检索表数据,但是在request.urlopen()。read()时找不到,不确定如何获取它,或者是因为i​​frame吗?请帮忙提出建议,因为我不想使用硒

我已经用过硒,但是我不想每次都使用driver.get加载chrome,我想从页面源中读取

import time
from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://www.aliexpress.com/item/Modyle-2017-New-Fashion-Her-King-and-His-Queen-Stainless-Steel-Wedding-Rings-for-Women-Men/32827876813.html?spm=2114.search0104.3.2.b0db7b0bXHTbvz&ws_ab_test=searchweb0_0,searchweb201602_1_10065_10068_10890_319_10546_317_10548_5730311_10696_453_10084_454_10083_5729211_10618_10307_538_537_536_10059_10884_10887_100031_321_322_10103,searchweb201603_51,ppcSwitch_0&algo_expid=0ffd4f5b-afac-45be-ac7e-a0c97769e137-0&algo_pvid=0ffd4f5b-afac-45be-ac7e-a0c97769e137&transAbTest=ae803_3")
time.sleep(5)

#row = len(driver.find_elements_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr'))

row = 20

print("member level" + "," + "username" + "," + "user country" + "," + "order no" + "," + "order time")

for n in range(1,row+1) :
    member = (driver.find_element_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr[' + str(n) + ']/td[1]/div/i').get_attribute("Class"))
    username = (driver.find_element_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr[' + str(n) + ']/td[1]/div/span').text)
    usercountry = (driver.find_element_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr[' + str(n) + ']/td[1]/div/div/b').text)
    orderno = (driver.find_element_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr[' + str(n) + ']/td[2]/div[1]').text)
    ordertime = (driver.find_element_by_xpath('//*[@id="j-transaction-feedback"]/div[2]/div[1]/div[1]/table/tbody/tr[' + str(n) + ']/td[2]/div[2]').text)
    print(member[-2:] + "," + username + "," + usercountry + "," + orderno + "," + ordertime)

我可以从硒中刮出这些字段作为下面的输出

会员级别,用户名,用户国家/地区,订货号,订货时间

A2,F ***。,IT,1件,2018年12月30日13:46

0 个答案:

没有答案