我是Python的新手,并希望从清单网站中获取房地产数据。我已经成功地从页面中提取了文本,但是返回的对象不是我期望的。
# import modules
from lxml import html
import requests
# specify webpage to scrape
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)
# scrape desired information
address_raw = tree.xpath('//a[@class="search-nav-link"]//text()')
price_raw = tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')
按预期,对象address_raw
和price_raw
是列表。但是此列表中包含的值不是具有立即可见的获得地址和价格的字符串。相反,他们都说[_ElementUnicodeResult object of lxml.etree module]
。在解释器中输入对象名称(例如address_raw
)和print(address_raw)
一样,将在列表中显示地址。如何创建一个简单的地址和价格列表作为字符串,而列表值不显示为[_ElementUnicodeResult object of lxml.etree module]
?
答案 0 :(得分:0)
您可以使用str()
将对象转换为字符串,并使用map()
将函数应用于列表的每个元素:
from lxml import html
import requests
url = 'https://www.mlslistings.com/Search/Result/e1fdabc8-9b53-470f-9728-b6ab1a5d1204/1'
page = requests.get(url)
tree = html.fromstring(page.content)
address_raw = list(map(str, tree.xpath('//a[@class="search-nav-link"]//text()')))
price_raw = list(map(str, tree.xpath('//span[@class="font-weight-bold listing-price d-block pull-left pr-25"]//text()')))
print(type(address_raw[0])) # => <class 'str'>
print(type(price_raw[0])) # => <class 'str'>