from bs4 import BeautifulSoup
import urllib
from openpyxl import Workbook
from openpyxl.compat import range
from openpyxl.cell import get_column_letter
r = urllib.urlopen('https://www.vrbo.com/576329').read()
soup = BeautifulSoup(r)
rate = soup.find_all('body')
print rate
print type(soup)
我试图在容器中捕获值,例如data-bedroom =" 3",特别是引号中给出的值,但我不知道它们是正式调用的,或者如何解析他们。
以下是"身体"的部分打印样本。所以我知道价值观存在,捕捉特定部分是我无法获得的:
数据ratemaximum =" $ 260#34;数据rateminimum =" $ 220#34;数据rateunits ="夜间"数据rawlistingnumber =" 576329"数据requestuuid =" 73bcfaa3-9637-40a8-801c-ae86f93caf39"数据searchpdptab =" C"数据serverday =" 18"数据showbookingphone ="假"
答案 0 :(得分:1)
要获取属性使用率[' attr']的值,例如:
from bs4 import BeautifulSoup
import urllib
from openpyxl import Workbook
from openpyxl.compat import range
from openpyxl.cell import get_column_letter
r = urllib.urlopen('https://www.vrbo.com/576329').read()
soup = BeautifulSoup(r, "html.parser")
rate = soup.find('body')
print rate['data-ratemaximum']
print rate['data-rateunits']
print rate['data-rawlistingnumber']
print rate['data-requestuuid']
print rate['data-searchpdptab']
print rate['data-serverday']
print rate['data-searchpdptab']
print rate['data-showbookingphone']
print rate
print type(soup)
from bs4 import BeautifulSoup
import urllib
from openpyxl import Workbook
from openpyxl.compat import range
from openpyxl.cell import get_column_letter
r = urllib.urlopen('https://www.vrbo.com/576329').read()
soup = BeautifulSoup(r, "html.parser")
rate = soup.find('body')
print rate['data-ratemaximum']
print rate['data-rateunits']
print rate['data-rawlistingnumber']
print rate['data-requestuuid']
print rate['data-searchpdptab']
print rate['data-serverday']
print rate['data-searchpdptab']
print rate['data-showbookingphone']
print rate
print type(soup)
答案 1 :(得分:0)
你需要挑选你的结果。知道您所寻求的内容在HTML中被称为标记的属性可能会有所帮助:
body_tag = rate[0]
data_bedrooms = body_tag.attrs['data-bedrooms']
上面的代码假设您只有一个<body>
- 如果您有更多,则需要在for
上使用rate
循环。您还可能希望将值转换为int()
的整数。
答案 2 :(得分:-1)
不确定您是否只想从data-bedrooms
对象中 soup
。我粗略地检查了输出产品,并且能够推断出你提到的data-*
项是属性,而不是标签。如果doc结构是一致的,您可以找到与该属性关联的相应标记,并使这些标记更有效:
import re
# regex pattern for attribs
data_tag_pattern = re.compile('^data\-')
# Create list of attribs
attribs_wanted = "data-bedrooms data-rateminimumdata-rateunits data-rawlistingnumber data-requestuuid data-searchpdptab data-serverday data-showbookingphone".split()
# Search entire tree
for item in soup.findAll():
# Use descendants to recurse downwards
for child in item.descendants:
try:
for attribute in child.attrs:
if data_tag_pattern.match(attribute) and attribute in attribs_wanted:
print("{}: {}".format(attribute, child[attribute]))
except AttributeError:
pass
这将产生输出:
data-showbookingphone: False
data-bedrooms: 3
data-requestuuid: 2b6f4d21-8b04-403d-9d25-0a660802fb46
data-serverday: 18
data-rawlistingnumber: 576329
data-searchpdptab: C
HTH!