我无法刮擦每个手提包的标题,价格和颜色。该网站是:https://www.coach.com/shop/women-handbags
我已经尝试过不同的抓取工具,并将抓取信息放置在while循环的不同部分。
在while循环之后,提供的代码滚动了整个页面,然后回到最顶部。
products = driver.find_elements_by_xpath('/html/body/div[1]/div[8]/div[4]/div/div/div/div[1]/div[1]/div')
for product in products:
bag_dict = {}
try:
name = product.find_element_by_tag_name('a').text
price = thing.find_element_by_xpath('.//span[@class="price-sales"]').text
bag_dict['name'] = name
bag_dict['price'] = price
except:
continue
print(bag_dict)
我收到一个空字典或错误消息,提示未找到bag_dict。
答案 0 :(得分:0)
找到了站点发出的将手提袋装入24套的请求,此代码将遍历所有套,然后将每个手提袋的价格和名称存储在数据框中。硒不是必需的,我使用了请求和beautifulsoup。
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
handbags = pd.DataFrame()
for next_set in range(0, 481, 24):
payload = f'start={next_set}&format=page-element'
r = requests.get('https://www.coach.com/shop/women-handbags', params = payload)
soup = BeautifulSoup(r.text, 'html.parser')
names = [name.meta['content'] for name in soup.find_all(class_="product-name")]
prices = [price.find('span', {'data-sales-price': re.compile(r'\d+\.\d+')})['data-sales-price'] for price in soup.find_all(class_="product-price")]
temp_df = pd.DataFrame({'Names': names, 'Prices': prices})
handbags = handbags.append(temp_df).reset_index(drop=True)
print("Appended next set")
print(handbags)
Names Prices
0 TROUPE TOTE IN COLORBLOCK 695.0
1 TROUPE TOTE IN COLORBLOCK WITH SNAKESKIN DETAIL 750.0
2 TROUPE TOTE 695.0
3 TROUPE TOTE WITH KAFFE FASSETT PRINT 795.0
4 TROUPE TOTE IN SIGNATURE CANVAS WITH PATCHWORK... 895.0
5 TROUPE TOTE IN SIGNATURE CANVAS WITH KAFFE FAS... 795.0
6 TROUPE TOTE IN SIGNATURE CANVAS 695.0
7 TROUPE CARRYALL WITH CROCODILE DETAIL 1100.0
8 TROUPE CARRYALL 595.0
9 TROUPE CARRYALL IN SIGNATURE CANVAS 595.0
10 TROUPE CARRYALL 35 IN COLORBLOCK WITH SNAKESKI... 850.0
11 TROUPE CARRYALL 35 IN SIGNATURE CANVAS WITH KA... 995.0
12 TROUPE SHOULDER BAG WITH KAFFE FASSETT PRINT 550.0
13 TROUPE CROSSBODY WITH KAFFE FASSETT PRINT 595.0
14 TROUPE CROSSBODY 495.0
15 TROUPE CROSSBODY IN SIGNATURE CANVAS 495.0
16 TABBY TOP HANDLE IN COLORBLOCK SNAKESKIN 695.0
17 TABBY TOP HANDLE IN COLORBLOCK 550.0
18 TABBY TOP HANDLE IN COLORBLOCK 550.0
19 TABBY TOP HANDLE 550.0
20 TABBY TOP HANDLE IN SIGNATURE CANVAS WITH KAFF... 650.0
21 TABBY SHOULDER BAG 26 IN SIGNATURE CANVAS WITH... 450.0
22 TABBY SHOULDER BAG 26 IN SNAKESKIN 650.0
23 TABBY SHOULDER BAG 26 IN COLORBLOCK WITH SNAKE... 450.0
24 TABBY SHOULDER BAG 26 IN COLORBLOCK 350.0
25 TABBY SHOULDER BAG 26 IN COLORBLOCK WITH SNAKE... 450.0
26 TABBY SHOULDER BAG 26 350.0
27 TABBY SHOULDER BAG 26 350.0
28 TABBY SHOULDER BAG WITH KAFFE FASSETT PRINT 550.0
29 TABBY SHOULDER BAG IN SNAKESKIN 595.0
.. ... ...
439 DINKY CHAIN STRAP 35.0
440 NOVELTY STRAP 95.0
441 NOVELTY STRAP 50.0
442 STRAP IN SIGNATURE CANVAS 95.0
443 STRAP IN SNAKESKIN 150.0
444 STRAP WITH CHAIN 150.0
445 STRAP WITH WAVE PATCHWORK AND SNAKESKIN DETAIL 150.0
446 CASSIE CROSSBODY 350.0
447 NOVELTY STRAP WITH TEA ROSE AND TOOLING 150.0
448 CENTRAL TOTE WITH ZIP 295.0
449 DREAMER WRISTLET 175.0
450 DREAMER WRISTLET IN COLORBLOCK 175.0
451 DREAMER WRISTLET IN SIGNATURE CANVAS 175.0
452 DREAMER WRISTLET WITH SNAKESKIN DETAIL 225.0
453 RIVINGTON CONVERTIBLE POUCH 250.0
454 RIVINGTON CONVERTIBLE POUCH IN SIGNATURE CANVAS 250.0
455 ROGUE POUCH 325.0
456 ROGUE POUCH 325.0
457 CHARLIE POUCH 175.0
458 CHARLIE POUCH IN COLORBLOCK SIGNATURE CANVAS 175.0
459 CHARLIE POUCH WITH MEADOW PRAIRIE PRINT 195.0
460 CHARLIE POUCH WITH SCATTERED RIVETS 195.0
461 CHARLIE POUCH WITH SIGNATURE CANVAS BLOCKING 175.0
462 LARGE CHARLIE POUCH 225.0
463 LARGE CHARLIE POUCH WITH PATCHWORK STRIPES 275.0
464 LARGE CHARLIE POUCH WITH SCATTERED RIVETS 275.0
465 LARGE WRISTLET 30 IN SIGNATURE CANVAS WITH STA... 195.0
466 LARGE WRISTLET 30 WITH REXY AND CARRIAGE 195.0
467 KISSLOCK CLUTCH 225.0
468 KISSLOCK CLUTCH IN COLORBLOCK 225.0
[469 rows x 2 columns]