我目前正在做一些网页剪贴,我有以下HTML代码:
<meta property="og:price:amount" content="1.89"/>
<meta property="og:price:standard_amount" content="6.31"/>
<meta property="og:price:currency" content="USD"/>
我正在使用漂亮的汤(python)
我要提取的信息是1.89和6.31(产品价格)。
这是我的代码:
import requests
from bs4 import BeautifulSoup
page = requests.get('https://spanish.alibaba.com/product-detail/crazy-hot-selling-multifunctional-battery-powered-360-degree-rotation-led-light-makeup-mirror-60769168637.html?spm=a2700.8270666-66.2016122619262.17.5a4d5d09En8wm9')
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.get_text())
# get the repo list
v2 = soup.find_all("meta", {"property": "og:price:amount", "content": True}['content'] )
print("v2 is",v2)
错误出在.find_all()
函数中,我不确定如何提取数据。我也尝试过.find()
函数
这是我得到的有关美丽汤功能如何工作的信息:
Signature: find_all(name, attrs, recursive, string, limit, **kwargs)
帮我配置.find()
功能。谢谢!
答案 0 :(得分:1)
使用find_all()
来代替find()
find_all()
返回元素列表。
v2 = soup.find("meta", {"property": "og:price:amount", "content": True})['content']
print("v2 is",v2)
或者您可以使用 Css选择器:
v2 = soup.select_one('meta[property="og:price:amount"][content]')['content']
print("v2 is",v2)
答案 1 :(得分:1)
.find_all()
将返回一个列表。您需要遍历该列表。或建议的其他选择是使用.find()
。 .find()
将返回第一个元素,而不管html中有多少个元素。但是看到您想要多个元素,您还需要使用正则表达式来查找所有包含'og:price:'
import requests
from bs4 import BeautifulSoup
import re
page = requests.get('https://spanish.alibaba.com/product-detail/crazy-hot-selling-multifunctional-battery-powered-360-degree-rotation-led-light-makeup-mirror-60769168637.html?spm=a2700.8270666-66.2016122619262.17.5a4d5d09En8wm9')
# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.get_text())
# get the repo list
regex = re.compile('.*og:price:.*')
v2 = soup.find_all("meta", {"property": regex, "content": True})
for each in v2:
print('%s is %s' %(each['property'].split(':')[-1], each['content']))
输出:
amount is 1.89
standard_amount is 6.31
currency is USD