Question

当我试图在此站点（https://cd.lianjia.com/ershoufang/106101326994.html）上获取一些房屋信息时，我遇到了一个问题，即要使用beautifulsoup4模块获取语句“ Received: from webX.hosts.our-company-intern.com (xxx.xxx.xxx.xxx) by ... Received: from appname (helo=[127.0.0.1]) by webX.hosts.our-company-intern.com with local-smtp (Exim 4.89) (envelope-from <noreply@customer-brand.com>) // this should change ... From: Customer Brand <noreply@customer-brand.com> Return-Path: noreply@customer-brand.com // this should change”中的“内容”，我总是得到0，而不是内容。enter image description here。非常感谢！

这是我的代码：

<span> contents <span>==$0

Answer 1

您现在正在做的事情是从行中打印正在创建的对象的索引： result ['totalcount'] = soup.select（'。totalCount'）[0] .select（'span'）[0] .text

相反，您应该捕获内容或使用诸如class，id和其他属性之类的

import requests
from bs4 import BeautifulSoup

def getSigleHouseDetail(houseurl):

    res = requests.get(houseurl)

    soup = BeautifulSoup(res.text,'html.parser',from_encoding='utf-8')
    method_divs = soup.body.find_all('span', attrs= {'class': 'className'})
    return method_divs[0].text

url = 'https://cd.lianjia.com/ershoufang/106101326994.html'
print(getSigleHouseDetail(url))

该行：返回method_divs [0] .text 将使用className打印第一个跨度的文本

Answer 2

感谢您的所有回答。我发现语句'<span> contents <span>==$0'中的内容可以在javescript数据中找到。

如何使用Beatutifulsoup在'<span> contents <span> == $ 0'中获取'contents'

2 个答案: