当我试图在此站点(https://cd.lianjia.com/ershoufang/106101326994.html)上获取一些房屋信息时,我遇到了一个问题,即要使用beautifulsoup4模块获取语句“ Received: from webX.hosts.our-company-intern.com (xxx.xxx.xxx.xxx) by ...
Received: from appname (helo=[127.0.0.1])
by webX.hosts.our-company-intern.com with local-smtp (Exim 4.89)
(envelope-from <noreply@customer-brand.com>) // this should change
...
From: Customer Brand <noreply@customer-brand.com>
Return-Path: noreply@customer-brand.com // this should change
”中的“内容”,我总是得到0,而不是内容。enter image description here。非常感谢!
这是我的代码:
<span> contents <span>==$0
答案 0 :(得分:0)
您现在正在做的事情是从行中打印正在创建的对象的索引: result ['totalcount'] = soup.select('。totalCount')[0] .select('span')[0] .text
相反,您应该捕获内容或使用诸如class,id和其他属性之类的
import requests
from bs4 import BeautifulSoup
def getSigleHouseDetail(houseurl):
res = requests.get(houseurl)
soup = BeautifulSoup(res.text,'html.parser',from_encoding='utf-8')
method_divs = soup.body.find_all('span', attrs= {'class': 'className'})
return method_divs[0].text
url = 'https://cd.lianjia.com/ershoufang/106101326994.html'
print(getSigleHouseDetail(url))
该行: 返回method_divs [0] .text 将使用className打印第一个跨度的文本
答案 1 :(得分:0)
感谢您的所有回答。我发现语句'<span> contents <span>==$0'
中的内容可以在javescript数据中找到。