我从网站上提取信息,输出很长。如何选择我感兴趣的关键部分并将其分配给新对象
继承人我用来提取信息的部分代码 -
soup = bs(response.text,"html.parser")
cartl = soup.find("div",{"class":"product-view"})
cart_link = cartl.find_all("form")
这是我的长输出(我缩短了它的例子,它拉的全文就像100行) -
<form action="https://www.randomsite.com/checkout/cart/add/uenc/aHR0cHM6Ly93d3cudGhlZ29vZHdpbGxvdXQuY29tL25pa2UtYWlyLWpvcmRhbi0xMy1yZXRyby1iZy1oaXN0b3J5LW9mLWZsaWdodC13aGl0ZS1tZXRhbGljLXNpbHZlci11bml2ZXJzaXR5LXJlZC00MTQ1NzQtMTAzP19fX1NJRD1V/product/92797/form_key/NBlK6IE3LYdwf0Vh/" id="product_addtocart_form" method="post">
<input name="form_key" type="hidden" value="NBlK6IE3LYdwf0Vh"/>
<div class="no-display">
<input name="product" type="hidden" value="92797"/>
<input id="related-products-field" name="related_product" type="hidden" value=""/>
</div>
我想将此添加到新对象 - https://www.randomsite.com/checkout/cart/add/uenc/aHR0cHM6Ly93d3cudGhlZ29vZHdpbGxvdXQuY29tL25pa2UtYWlyLWpvcmRhbi0xMy1yZXRyby1iZy1oaXN0b3J5LW9mLWZsaWdodC13aGl0ZS1tZXRhbGljLXNpbHZlci11bml2ZXJzaXR5LXJlZC00MTQ1NzQtMTAzP19fX1NJRD1V/product/92797/form_key/NBlK6IE3LYdwf0Vh/
这是通过以下答案的新更新代码,感谢 -
from bs4 import BeautifulSoup
import requests
session = requests.session()
endpoint = "https://randomsite.com/"
response = session.get(endpoint)
soup0 = BeautifulSoup(response.text,"html.parser")
div = soup0.find("div",{"class":"product-view"})
html = div.find("form")
soup = BeautifulSoup(html, 'html.parser')
form = soup.find('form', { 'id': 'product_addtocart_form' })
action = form['action']
print(action)
这是新的错误,我对我出错的地方有任何想法 -
Traceback (most recent call last):
File "test.py", line 16, in <module>
soup = BeautifulSoup(html, 'html.parser')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/bs4/__init__.py", line 191, in __init__
markup = markup.read()
TypeError: 'NoneType' object is not callable
答案 0 :(得分:1)
您可以使用BeautifulSoup find
方法获取对<form>
标记的引用(如果页面上有多个表单,则可选择对特定id
进行过滤)。然后,将表单对象视为字典以提取action
属性。
from bs4 import BeautifulSoup
html = '''
<form action="https://www.randomsite.com/checkout/cart/add/uenc/aHR0cHM6Ly93d3cudGhlZ29vZHdpbGxvdXQuY29tL25pa2UtYWlyLWpvcmRhbi0xMy1yZXRyby1iZy1oaXN0b3J5LW9mLWZsaWdodC13aGl0ZS1tZXRhbGljLXNpbHZlci11bml2ZXJzaXR5LXJlZC00MTQ1NzQtMTAzP19fX1NJRD1V/product/92797/form_key/NBlK6IE3LYdwf0Vh/" id="product_addtocart_form" method="post">
<input name="form_key" type="hidden" value="NBlK6IE3LYdwf0Vh"/>
<div class="no-display">
<input name="product" type="hidden" value="92797"/>
<input id="related-products-field" name="related_product" type="hidden" value=""/>
</div>
'''
soup = BeautifulSoup(html, 'html.parser')
form = soup.find('form', { 'id': 'product_addtocart_form' })
action = form['action']
print action
https://www.randomsite.com/checkout/cart/add/uenc/aHR0cHM6Ly93d3cudGhlZ29vZHdpbGxvdXQuY29tL25pa2UtYWlyLWpvcmRhbi0xMy1yZXRyby1iZy1oaXN0b3J5LW9mLWZsaWdodC13aGl0ZS1tZXRhbGljLXNpbHZlci11bml2ZXJzaXR5LXJlZC00MTQ1NzQtMTAzP19fX1NJRD1V/product/92797/form_key/NBlK6IE3LYdwf0Vh/