from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
from array import array
import csv
url = ['http://cura.free.fr/gauq/902gdA1.html', 'http://cura.free.fr/gauq/902gdA1y.html', 'http://cura.free.fr/gauq/902gdA2.html', 'http://cura.free.fr/gauq/902gdA2y.html', 'http://cura.free.fr/gauq/902gdA3.html']
data = []
m = 0
for i in range(1,len(url)):
if m<url[i]:
page = urlopen(i)
soup = BeautifulSoup(page)
name_box = soup.find("pre")
name = name_box.text.strip()
f = open('output.txt', 'w')
print >> f, 'Filename:', name
f.close()
答案 0 :(得分:0)
您需要在for
语句和if
语句之后缩进块。
尝试以下代码:
from urllib2 import urlopen
from BeautifulSoup import BeautifulSoup
from array import array
import csv
url = [
'http://cura.free.fr/gauq/902gdA1.html',
'http://cura.free.fr/gauq/902gdA1y.html',
'http://cura.free.fr/gauq/902gdA2.html',
'http://cura.free.fr/gauq/902gdA2y.html',
'http://cura.free.fr/gauq/902gdA3.html'
]
data = []
m = 0
for i in range(1,len(url)):
if m<url[i]:
page = urlopen(i)
soup = BeautifulSoup(page)
name_box = soup.find("pre")
name = name_box.text.strip()
f = open('output.txt', 'w')
print >> f, 'Filename:', name
f.close()
正如@kuro建议的那样,您可能想要更改打开和关闭文件的句子的位置。如果您的目的是捕获给定网址中预标记内的所有文字,则以下代码将为您执行此操作。
f = open('output.txt', 'w')
data = []
m = 0
for i in range(1,len(url)):
if m<url[i]:
page = urlopen(i)
soup = BeautifulSoup(page)
name_box = soup.find("pre")
name = name_box.text.strip()
print >> f, 'Filename:', name
f.close()
答案 1 :(得分:0)
我把代码编写为:
f = open('output.txt', 'w')
for i in url:
page = urlopen(i)
soup = BeautifulSoup(page)
name_box = soup.find("pre")
name = name_box.text.encode('utf-8').strip()
print >> f, 'Filename:', name
f.close()
它对我有用。