Python IDLE 2.7
我正在尝试从A-Z获取所有公司名称,并将所有结果保存到csv文件中。这是第一个网址http://app.core-apps.com/weftec2014/exhibitors/list/A
如果我手动更改网址的最后一个字母26次,则以下代码适用于每个页面,例如http://app.core-apps.com/weftec2014/exhibitors/list/Z
import urllib2
response = urllib2.urlopen('http://app.core-apps.com/weftec2014/exhibitors/list/A')
page = response.read()
page = page[4632:]
def get_next_target(page):
start_link = page.find("<a href='/weftec2014/exhibitors/")
if start_link == -1:
return None, 0
else:
start_place = start_link+73 #to get company names after the first <div>
end_place = page.find("</div>", start_place)
item = page[start_place:end_place]
return item, end_place
def print_all_com(page): #return company names
results = []
while True:
item, end_place = get_next_target(page)
if item:
results.append( [ item.strip() ] )
#print item
page = page[end_place:]
else:
break
return results
data = print_all_com(page)
import csv
with open('weftec.csv','w') as f:
writer = csv.writer(f)
writer.writerows(data)
但是我想让python循环通过A - Z并返回所有公司名称AT ONCE。 所以我在前一个脚本下添加了另一个编码块:
letter = ['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
url = 'http://app.core-apps.com/weftec2014/exhibitors/list/'
for n in range(0, len(letter)):
target = []
url_letter = url+letter[n]
response = urllib2.urlopen(url_letter)
page = response.read()
page = page[4632:]
data = print_all_com(page)
target.append(data)
我认为上面的脚本有问题,因为len(目标)是1,而不是A - Z公司的总数。
当我将结果保存到CSV文件中时,它给出了一个非常奇怪的结果,即Z页面上的公司名称。请参阅下面的确切结果。
['ZAPS Technologies, Inc'] ['Zoeller Engineered Products']
['ZAPS Technologies, Inc'] ['Zoeller Engineered Products']
我觉得第二个街区出了问题,但我真的不知道......
答案 0 :(得分:0)
您在该循环的每次迭代中将目标设置为空列表。首先将目标初始化为循环外的空列表。
target = []
for n in range(0, len(letter)):
...