用美丽的汤报废

时间:2019-12-17 15:33:11

标签: python web-scraping beautifulsoup

我目前正在做一些网页剪贴,我有以下HTML代码:

<meta property="og:price:amount" content="1.89"/>
<meta property="og:price:standard_amount" content="6.31"/>
<meta property="og:price:currency" content="USD"/>

我正在使用漂亮的汤(python)

我要提取的信息是1.89和6.31(产品价格)。

这是我的代码:

import requests
from bs4 import BeautifulSoup


page = requests.get('https://spanish.alibaba.com/product-detail/crazy-hot-selling-multifunctional-battery-powered-360-degree-rotation-led-light-makeup-mirror-60769168637.html?spm=a2700.8270666-66.2016122619262.17.5a4d5d09En8wm9')

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.get_text())
# get the repo list


v2 = soup.find_all("meta", {"property": "og:price:amount", "content": True}['content'] )
print("v2 is",v2)

错误出在.find_all()函数中,我不确定如何提取数据。我也尝试过.find()函数

这是我得到的有关美丽汤功能如何工作的信息: Signature: find_all(name, attrs, recursive, string, limit, **kwargs)

帮我配置.find()功能。谢谢!

2 个答案:

答案 0 :(得分:1)

使用find_all()来代替find()

find_all()返回元素列表。

v2 = soup.find("meta", {"property": "og:price:amount", "content": True})['content'] 
print("v2 is",v2)

或者您可以使用 Css选择器

v2 = soup.select_one('meta[property="og:price:amount"][content]')['content']
print("v2 is",v2)

答案 1 :(得分:1)

.find_all()将返回一个列表。您需要遍历该列表。或建议的其他选择是使用.find().find()将返回第一个元素,而不管html中有多少个元素。但是看到您想要多个元素,您还需要使用正则表达式来查找所有包含'og:price:'

的元素。
import requests
from bs4 import BeautifulSoup
import re

page = requests.get('https://spanish.alibaba.com/product-detail/crazy-hot-selling-multifunctional-battery-powered-360-degree-rotation-led-light-makeup-mirror-60769168637.html?spm=a2700.8270666-66.2016122619262.17.5a4d5d09En8wm9')

# Create a BeautifulSoup object
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup.get_text())
# get the repo list

regex = re.compile('.*og:price:.*')
v2 = soup.find_all("meta", {"property": regex, "content": True})

for each in v2:
    print('%s is %s' %(each['property'].split(':')[-1], each['content']))

输出:

amount is 1.89
standard_amount is 6.31
currency is USD