使用Beaufifulsoup和Requests从网站上搜索内容

时间:2017-04-19 00:04:48

标签: python web-scraping beautifulsoup python-requests

UPDATE =我的脚本提取以下文字,但我仍然在努力获取我需要的信息。

[<button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-ba7189.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-w-ba7212.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-w-ba7560.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-ultraboost-x-bb0879.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/books-all-gone-book-2016.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156645c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156646c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/new-balance-m576-lifestyle-m576-pgw.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-low-310810-407.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-4-retro-308497-117.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-cny-fm-363637-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-white-black-364462-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-wrinkled-patent-364465-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-zoku-runner-ultk-is-bd5852.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-solid-pique-polo-1702p3795-blk.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-camo-poly-jkt-170203584-camo.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-bb2791.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-pk-ba7496.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-equipment-support-ultra-ba7474.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-bb2910.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/asics-gel-kayano-trainer-knit-h7s4n-4545.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-414571-122.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-15-retro-881429-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-6-retro-384664-113.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-002.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-sock-racer-og-875837-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-nikelab-air-max-1-pinnacle-859554-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-premium-core-362632-03.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-cl-lthr-golden-neutrals-bd3744.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-club-c-85-gum-bs7635.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10346/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10341/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10336/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>]

我目前正在尝试提取&#34; form_key&#34;来自刮下文本的信息。在这个例子中,表格键是&#34; Ayxpa0t2JpTEfPBd&#34; - 这是我想要提取和打印的文字

您能否告诉我如何提取和打印信息。提前谢谢!

2 个答案:

答案 0 :(得分:1)

您可以使用正则表达式提取form_key

In [1]: s = 'http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/'

In [2]: import re

In [3]: m = re.search('.*/form_key/([^/]+)/.*', s)

In [4]: m.group(1)
Out[4]: 'Ayxpa0t2JpTEfPBd'

因此,为了匹配您的示例,您可以执行以下操作:

import re

s = """onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')"><span><span>SHOP NOW</span></span></button>"""
m = re.search('.*/form_key/([^/]+)/.*', s)

if m:
    print m.group(1)

答案 1 :(得分:0)

在这里,此代码搜索页面中的按钮,选择一个,获取onclick属性,然后获取表单键。正则表达式是罗伯特的回答,所以一定要用upvote来感谢他!

import requests
from bs4 import BeautifulSoup
import re

url = "http://www.urbanjunglestore.com/"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

req = requests.request("GET", url, headers=headers, verify=False)
response = BeautifulSoup(req.content,
                         "html.parser")

all_buttons = response.find_all("button", title="SHOP NOW") 

one_button = all_buttons[0]

onclick_attribute = one_button['onclick']  # this gets the text of the onclick attribute

def get_form_key_from_onclick_attr(attr_text):
    """ use a regex to extract the form key from the onclick attribute text """
    results = re.search('.*/form_key/([^/]+)/.*', attr_text)
    return results.group(1)

get_form_key_from_onclick_attr(onclick_attribute)