我使用以下代码从url中抓取数据(在代码中提到)。我运行代码,但它没有给出任何输出,也没有抛出任何错误?我是python语言的新手,可能是一个愚蠢的问题。有人能帮助我吗?
import csv
import urllib2
import sys
import time
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.t-mobile.de/smartphones/0,22727,23392-_3-0--0-all-,00.html').read()
soup = BeautifulSoup(page)
soup.prettify()
with open('TMO_DE_2012-12-26.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow(["Date","Month","Day of Week","Device Name","Price"])
items = soup.findAll('div', {"class": "top"},text=True)
prices = soup.findAll('strong', {"class": "preis-block"})
for item, price in zip(items, prices):
textcontent = u' '.join(price.stripped_strings)
print unicode(item.string).encode('utf8').strip()
if textcontent:
spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%B"),time.strftime("%A") ,unicode(item.string).encode('utf8').strip(),textcontent])
答案 0 :(得分:0)
该页面上没有带文字的<div class="top">
元素,因此items
是一个空列表。删除text=True
过滤器:
items = soup.findAll('div', {"class": "top"})
并从中提取所有文字:
item_text = u' '.join(item.stripped_strings)
if textcontent and item_text:
spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%B"),time.strftime("%A") , item_text, textcontent])
或,已集成到现有代码中:
with open('TMO_DE_2012-12-26.csv', 'wb') as csvfile:
spamwriter = csv.writer(csvfile, delimiter=',')
spamwriter.writerow(["Date","Month","Day of Week","Device Name","Price"])
items = soup.findAll('div', {"class": "top"})
prices = soup.findAll('strong', {"class": "preis-block"})
for item, price in zip(items, prices):
textcontent = u' '.join(price.stripped_strings)
item_text = u' '.join(item.stripped_strings)
if item_text and textcontent:
spamwriter.writerow([time.strftime("%Y-%m-%d"),time.strftime("%B"),time.strftime("%A"), item_text.encode('utf8'),textcontent.encode('utf8')])