无法使用请求解析网页中不同项目的链接

时间:2019-08-27 08:36:55

标签: python python-3.x web-scraping

我已经用python编写了一个脚本,利用BeautifulSoup从网页上抓取了不同项目的链接。 运行脚本时,在36个链接中只有6个链接。

尽管该页面的其余内容是动态生成的,但我相信可以使用请求来捕获它们的任何优雅方式。

Website address

如何使用请求将它们全部获取?

我尝试过:

import requests
from bs4 import BeautifulSoup

link = "find the link above"

def get_links(link):
    res = requests.get(link,headers={"User-Agent":"Mozilla/5.0"})
    soup = BeautifulSoup(res.text,"lxml")
    for item_links in soup.select("#pull-results figure[data-pingdom-info='purchasable-deal']"):
        item_link = item_links.select_one("a[class^='cui-content']").get("href")
        yield item_link

if __name__ == '__main__':
    for elem in get_links(link):
        print(elem)
  

注意:我不希望找到与任何浏览器模拟器(例如硒)相关的解决方案。

1 个答案:

答案 0 :(得分:2)

通过AJAX请求从不同的URL加载数据。还必须设置正确的User-Agent。这将在标题的旁边打印所有36个链接:

import requests
from bs4 import BeautifulSoup

url = 'https://www.groupon.com/browse/search/partial?division=houston&badge=top-seller&query=med+spa&page=1'

headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'}

def get_links(link):
    json_data = requests.get(link, headers=headers).json()
    soup = BeautifulSoup( json_data['cardsHtml'], 'lxml' )
    for a, title in zip(soup.select('a.cui-content'), soup.select('.cui-udc-title')):
        yield a['href'], title.get_text(strip=True)

if __name__ == '__main__':
    print('{: <4}{: <40}{}'.format('No.', 'Title', 'URL'))
    print('-' * 120)
    for i, (link, title) in enumerate(get_links(url), 1):
        print('{: <4}{: <40}{}'.format('%s.' % i, title, link))

打印:

No. Title                                   URL
------------------------------------------------------------------------------------------------------------------------
1.  Body Envy Med Spa                       https://www.groupon.com/deals/body-envy-houston-5
2.  DermaNova Med Spa                       https://www.groupon.com/deals/dermanova-med-spa
3.  Limitless Medspa                        https://www.groupon.com/deals/limitless-med-spa-9
4.  New Heights Med Spa                     https://www.groupon.com/deals/new-heights-med-spa-6
5.  Wild Olive Beauty Haven                 https://www.groupon.com/deals/wild-olive-beauty-haven
6.  Urban Float                             https://www.groupon.com/deals/urban-float-houston-heights-3
7.  Glo Sun Spa Houston                     https://www.groupon.com/deals/glo-sun-spa-7
8.  Massage Heights Weslayan Plaza          https://www.groupon.com/deals/massage-heights-weslayan-plaza-4
9.  Hiatus Spa + Retreat                    https://www.groupon.com/deals/hiatus-spa-retreat-houston
10. Aura Brushed                            https://www.groupon.com/deals/aura-brushed
11. Heights Retreat Salon & Spa             https://www.groupon.com/deals/heights-retreat-new-ein
12. Woosah Massage and Wellness For Women   https://www.groupon.com/deals/woosah-massage-and-wellness
13. RD Laser Skin Solutions                 https://www.groupon.com/deals/rd-laser-skin-solutions-4
14. Clippers                                https://www.groupon.com/deals/clippers-2
15. Paige Larrick Electrology               https://www.groupon.com/deals/paige-larrick-electrology
16. Luxurious Sunless Tanning               https://www.groupon.com/deals/luxurious-sunless-tanning-2-4
17. LeLux Beautique                         https://www.groupon.com/deals/lelux-beautique-7
18. Paul Mitchell the School Houston        https://www.groupon.com/deals/paul-mitchell-the-school-houston
19. Faith Aesthetics                        https://www.groupon.com/deals/faith-aesthetics
20. Malibu Tan                              https://www.groupon.com/deals/malibu-tan-5
21. Maquillage Pro Beauty                   https://www.groupon.com/deals/maquillage-pro-beauty-2-14
22. E-Z Tan                                 https://www.groupon.com/deals/e-z-tan-3
23. Queen's Beauty Salon & Spa              https://www.groupon.com/deals/queens-beauty-salon-and-spa
24. MySmile Inc.                            https://www.groupon.com/deals/mysmile-inc-1
25. Blast Beauty Bar                        https://www.groupon.com/deals/blast-beauty-bar-2
26. No Hair Left Behind                     https://www.groupon.com/deals/no-hair-left-behind-1
27. BACS Clinic - Wellness Centre           https://www.groupon.com/deals/bacs-clinic
28. Soul The Beauty Bar And Yoni Spa        https://www.groupon.com/deals/soul-the-beauty-bar-and-yoni-spa
29. Touch Of Health Massage                 https://www.groupon.com/deals/touch-of-health-massage-1-3
30. Wink At U By Ryan                       https://www.groupon.com/deals/wink-at-u-by-ryan
31. Alanis Salon                            https://www.groupon.com/deals/alanis-salon-2
32. Perfected Lashes                        https://www.groupon.com/deals/perfected-lashes-1
33. Face It Makeup Studio                   https://www.groupon.com/deals/face-it-makeup-studio-3
34. Green Apple Salon                       https://www.groupon.com/deals/green-apple-salon-montrose-2
35. Snatched by J                           https://www.groupon.com/deals/snatched-by-j-body-fit
36. Premier Cosmetic                        https://www.groupon.com/deals/premier-cosmetic-4