通过python中的Beautifulsoup提取对象的描述

时间:2018-10-24 21:30:44

标签: python html web-scraping beautifulsoup extract

我想提取图形附近的描述(从“小雕像模型”到“保持调整的状态:”),然后通过BeautifulSoup将其存储到变量information中。我该怎么做? 这是我的代码,但我不知道如何继续:

from bs4 import BeautifulSoup
response = requests.get('https://www.myminifactory.com/object/3d-print-the-little-prince-4707')
soup = BeautifulSoup(response.text, "lxml")
information = 

我在您要提取对象描述的页面下方显示给您。先感谢您! The Page from where I want to extract the text

2 个答案:

答案 0 :(得分:2)

这对我有用,因为我使用break语句的方式而对脚本不感到骄傲。但是该脚本有效。

from urllib.request import urlopen
from bs4 import BeautifulSoup as BS

url = r'https://www.myminifactory.com/object/3d-print-the-little-prince-4707'

html = urlopen(url).read()
Soup = BS(html,"lxml")
Desc = Soup.find('div',{'class':'short-text text-auto-link'}).text
description = ''
for line in Desc.split('\n'):
    if line.strip() == '_________________________________________________________________________':
        break
    if line.strip():
        description += line.strip()
print(description)

答案 1 :(得分:1)

找到父标签,然后寻找<p>,过滤空格和____

parent = soup.find("div",class_="row container-info-obj margin-t-10")
result = [" ".join(p.text.split()) for p in parent.find_all("p") if p.text.strip() and not "_"*8  in p.text]
#youtube_v = parent.find("iframe")["src"]
print(result)