如何使用Beatutifulsoup在'<span> contents <span> == $ 0'中获取'contents'

时间:2018-10-20 11:33:24

标签: python

当我试图在此站点(https://cd.lianjia.com/ershoufang/106101326994.html)上获取一些房屋信息时,我遇到了一个问题,即要使用beautifulsoup4模块获取语句“ Received: from webX.hosts.our-company-intern.com (xxx.xxx.xxx.xxx) by ... Received: from appname (helo=[127.0.0.1]) by webX.hosts.our-company-intern.com with local-smtp (Exim 4.89) (envelope-from <noreply@customer-brand.com>) // this should change ... From: Customer Brand <noreply@customer-brand.com> Return-Path: noreply@customer-brand.com // this should change ”中的“内容”,我总是得到0,而不是内容。enter image description here。非常感谢!

这是我的代码:

<span> contents <span>==$0

2 个答案:

答案 0 :(得分:0)

您现在正在做的事情是从行中打印正在创建的对象的索引:     result ['totalcount'] = soup.select('。totalCount')[0] .select('span')[0] .text

相反,您应该捕获内容或使用诸如class,id和其他属性之类的

import requests
from bs4 import BeautifulSoup

def getSigleHouseDetail(houseurl):

    res = requests.get(houseurl)

    soup = BeautifulSoup(res.text,'html.parser',from_encoding='utf-8')
    method_divs = soup.body.find_all('span', attrs= {'class': 'className'})
    return method_divs[0].text

url = 'https://cd.lianjia.com/ershoufang/106101326994.html'
print(getSigleHouseDetail(url))

该行:     返回method_divs [0] .text 将使用className打印第一个跨度的文本

答案 1 :(得分:0)

感谢您的所有回答。我发现语句'<span> contents <span>==$0'中的内容可以在javescript数据中找到。