为什么我只能获得PLAYER_NAME中最后一名玩家的统计数据?
我想获得PLAYER_NAME所有玩家的统计数据。
import csv
import requests
from bs4 import BeautifulSoup
import urllib
PLAYER_NAME = ["andy-murray/mc10", "rafael-nadal/n409"]
URL_PATTERN = 'http://www.atpworldtour.com/en/players/{}/player-stats?year=0&surfaceType=clay'
for item in zip (PLAYER_NAME):
url = URL_PATTERN.format(item)
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('div', attrs={'class': 'mega-table-wrapper'})
list_of_rows = []
for row in table.findAll('tr'):
list_of_cells = []
for cell in row.findAll('td'):
text = (cell.text.encode("utf-8").strip())
list_of_cells.append(text)
list_of_rows.append(list_of_cells)
outfile = open("./tennis.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Stat"])
writer.writerows(list_of_rows)
答案 0 :(得分:2)
正如评论中所提到的,您每次都在重新创建list_of_rows
。要修复它,你必须将它移到for循环之外,而不是附加到它,并将其转换为列表列表,扩展它。
另外,您的代码还有其他一些问题:
zip
是多余的,它实际上最终会将您的名称转换为元组,这将导致格式错误,您只想迭代PLAYER_NAME
,而当您在它时,可能会重命名那到PLAYER_NAMES
(因为它是一个名单)format
中参数的位置 - 在本例中为{0}
。
PLAYER_NAMES = ["andy-murray/mc10", "rafael-nadal/n409"]
URL_PATTERN = 'http://www.atpworldtour.com/en/players/{0}/player-stats?year=0&surfaceType=clay'
list_of_rows = []
for item in PLAYER_NAMES:
url = URL_PATTERN.format(item)
response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)
table = soup.find('div', attrs={'class': 'mega-table-wrapper'})
# for row in table.findAll('tr'):
# list_of_cells = []
# for cell in row.findAll('td'):
# text = (cell.text.encode("utf-8").strip())
# list_of_cells.append(text)
# list_of_rows.extend(list_of_cells) # Change to extend here
# Incidentally, the for loop above could also be written as:
list_of_rows += [
[cell.text.encode("utf-8").strip() for cell in row.findAll('td')]
for row in table.findAll('tr')
]