我正在尝试使用BeautifulSoup获取网站标题列表,并将它们放入Excel电子表格中。
文本文件“c:\ websites.txt”包含以下内容:
www.dailynews.com
www.dailynews.lk
www.dailynews.co.zw
www.gulf-daily-news.com
www.dailynews.gov.bw
锻炼:
from bs4 import BeautifulSoup
import urllib2
import xlwt
list_open = open('c:\\websites.txt')
read_list = list_open.read()
line_in_list = read_list.split('\n')
for websites in line_in_list:
url = "http://" + websites
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
site_title = soup.find_all("title")
print site_title
它工作正常并生成网站标题。但是当我在下面添加时:
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Sheet1', cell_overwrite_ok = True)
for cor, lmn in enumerate(line_in_list):
sheet.write (cor, 0, site_title)
book.save("C:\\site_titles.xls")
尝试将它们很好地输入到Excel电子表格的A列中,一个接一个,它不起作用。
答案 0 :(得分:1)
错误是您尝试保存BeautifulSoup对象
Exception: Unexpected data type <class 'bs4.element.Tag'>
尝试写入该对象的文本值,文件将被写得很好
for cor, lmn in enumerate(line_in_list):
sheet.write (cor, 0, site_title[0].text)
写循环错误,请尝试这样:
最后的剧本:
from bs4 import BeautifulSoup
import urllib2
import xlwt
line_in_list = ['www.dailynews.com','www.elpais.com'] #get urls from file
book = xlwt.Workbook(encoding='utf-8', style_compression = 0)
sheet = book.add_sheet('Sheet1', cell_overwrite_ok = True)
for cor,websites in enumerate(line_in_list):
url = "http://" + websites
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
site_title = soup.find_all("title")
print site_title
sheet.write (cor, 0, site_title[0].text)
book.save("site_titles.xls")