UPDATE =我的脚本提取以下文字,但我仍然在努力获取我需要的信息。
[<button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-ba7189.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-cs2-pk-w-ba7212.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-w-ba7560.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-ultraboost-x-bb0879.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/books-all-gone-book-2016.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156645c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/converse-ctas-modern-hi-156646c.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/new-balance-m576-lifestyle-m576-pgw.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-low-310810-407.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-4-retro-308497-117.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-cny-fm-363637-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-white-black-364462-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-creeper-wrinkled-patent-364465-01.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-zoku-runner-ultk-is-bd5852.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-solid-pique-polo-1702p3795-blk.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/staple-fila-camo-poly-jkt-170203584-camo.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-bb2791.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-eqt-support-adv-pk-ba7496.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-equipment-support-ultra-ba7474.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/adidas-nmd-r2-pk-bb2910.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/asics-gel-kayano-trainer-knit-h7s4n-4545.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-13-retro-414571-122.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-15-retro-881429-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-jordan-6-retro-384664-113.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-002.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-max-woven-boot-921854-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-air-sock-racer-og-875837-001.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/nike-nikelab-air-max-1-pinnacle-859554-400.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/puma-clyde-premium-core-362632-03.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-cl-lthr-golden-neutrals-bd3744.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/reebok-club-c-85-gum-bs7635.html')" title="Shop Now" type="button"><span><span>Shop Now</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10346/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10341/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>, <button class="button btn-cart" onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10336/form_key/Ayxpa0t2JpTEfPBd/')" title="SHOP NOW" type="button"><span><span>SHOP NOW</span></span></button>]
我目前正在尝试提取&#34; form_key&#34;来自刮下文本的信息。在这个例子中,表格键是&#34; Ayxpa0t2JpTEfPBd&#34; - 这是我想要提取和打印的文字
您能否告诉我如何提取和打印信息。提前谢谢!
答案 0 :(得分:1)
您可以使用正则表达式提取form_key
:
In [1]: s = 'http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/'
In [2]: import re
In [3]: m = re.search('.*/form_key/([^/]+)/.*', s)
In [4]: m.group(1)
Out[4]: 'Ayxpa0t2JpTEfPBd'
因此,为了匹配您的示例,您可以执行以下操作:
import re
s = """onclick="setLocation('http://www.urbanjunglestore.com/it/checkout/cart/add/uenc/aHR0cDovL3d3dy51cmJhbmp1bmdsZXN0b3JlLmNvbS9pdC8,/product/10356/form_key/Ayxpa0t2JpTEfPBd/')"><span><span>SHOP NOW</span></span></button>"""
m = re.search('.*/form_key/([^/]+)/.*', s)
if m:
print m.group(1)
答案 1 :(得分:0)
在这里,此代码搜索页面中的按钮,选择一个,获取onclick
属性,然后获取表单键。正则表达式是罗伯特的回答,所以一定要用upvote来感谢他!
import requests
from bs4 import BeautifulSoup
import re
url = "http://www.urbanjunglestore.com/"
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
req = requests.request("GET", url, headers=headers, verify=False)
response = BeautifulSoup(req.content,
"html.parser")
all_buttons = response.find_all("button", title="SHOP NOW")
one_button = all_buttons[0]
onclick_attribute = one_button['onclick'] # this gets the text of the onclick attribute
def get_form_key_from_onclick_attr(attr_text):
""" use a regex to extract the form key from the onclick attribute text """
results = re.search('.*/form_key/([^/]+)/.*', attr_text)
return results.group(1)
get_form_key_from_onclick_attr(onclick_attribute)