您好我正在尝试抓取www.allocine.fr获取最新电影
我制作了以下剧本:
# -*- coding: utf-8 -*-
import urllib
import re
page = ["?page=1", "?page=2", "?page=3"]
i=0
while i<len(page):
url = "http://www.allocine.fr/film/aucinema/" +page[i]
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<a class="no_underline" href="/film/fichefilm_gen_cfilm=[^.]*.html">\n(.+?)\n</a>'
pattern = re.compile(regex)
movie = re.findall(pattern,htmltext)
i+=1
movielist = '\n '.join(movie)
print movielist
问题是列表中的第一个和最后一个项目前面没有空格...我想说的是输出第一个列表中的最后一个项目和第一个项目中的第一个项目第二个列表不以空格分隔。
看起来像这样:
Something in 1st list
something2 in 1st list
something3 in 1st list
Otherthing in 2nd list
otherthing2 in 2nd list
otherthing3 in 2nd list
====
我希望它像: 某物 某物 某物 otherthing otherthing
答案 0 :(得分:1)
你可以:
之前打印空格:
movielist = ' ' + '\n '.join(movie)
打印每个项目的空间:
movielist = '\n'.join([' ' +i for i in movie])
例:
>>> print '\n '.join(movie)
something
something
something
otherthing
otherthing
>>> print ' '+'\n '.join(movie)
something
something
something
otherthing
otherthing
>>> print '\n'.join([' ' +i for i in movie])
something
something
something
otherthing
otherthing
答案 1 :(得分:0)
如果您只想并排列出项目,请将打印语句更改为print "foo" % bar,