如何在没有Beautifulsoup标签的情况下将html文本写入文件?

时间:2017-07-13 08:14:50

标签: python-2.7 web-scraping beautifulsoup

这是我的代码。我想将我已经删除的数据写入文件。但我只想要文本,而不是标签,它也有所有HTML标签,我不知道如何摆脱它。

import urllib2
from bs4 import BeautifulSoup

file = open("megapy.txt", "w")
file.seek(0)

FullPage = ['New-Arrivals-2017-6', 'Big-Sales-click-here', 'Arduino-Development-boards',
            'Robotics-and-Copters']

urlp1 = "http://www.arduinopak.com/Prd.aspx?Cat_Name="
URL = urlp1 + FullPage[0]

for n in FullPage:
    URL = urlp1 + n
    page = urllib2.urlopen(URL)
    bsObj = BeautifulSoup(page, "html.parser")

    descList = bsObj.findAll('div', attrs={"class": "panel-default"})
    for desc in descList:
        print(desc.get_text(separator=u' '))
        file.write(desc.prettify("utf-8"))

file.close()

但是,我一直在文本文件中获取此输出:

<div class="panel panel-default">
 <div class="panel-heading">
  <h5>
   2 X 8 FR4 PCB Prototype Circuit board Double Side
  </h5>
 </div>
 <div class="panel-body">
  <div class="row">
   <div class="col-md-4 pro-image">
    <a href="Prd_Detail.aspx?Prd_ID=20246">
     <img alt="2 X 8 FR4 PCB Prototype Circuit board Double Side" class="img-thumbnail" src="http://upsats.com/Content/Product/img/Product/Thumb/PCB2x8-.jpg"/>
    </a>
   </div>

0 个答案:

没有答案