在Python脚本中循环,仅获取最后的结果

时间:2016-04-20 14:46:23

标签: python loops

为什么我只能获得PLAYER_NAME中最后一名玩家的统计数据?

我想获得PLAYER_NAME所有玩家的统计数据。

import csv
import requests
from bs4 import BeautifulSoup
import urllib

PLAYER_NAME = ["andy-murray/mc10", "rafael-nadal/n409"]
URL_PATTERN = 'http://www.atpworldtour.com/en/players/{}/player-stats?year=0&surfaceType=clay'
for item in zip (PLAYER_NAME):
    url = URL_PATTERN.format(item)

    response = requests.get(url)
    html = response.content
    soup = BeautifulSoup(html)
    table = soup.find('div', attrs={'class': 'mega-table-wrapper'})

    list_of_rows = []
    for row in table.findAll('tr'):
        list_of_cells = []
        for cell in row.findAll('td'):
            text = (cell.text.encode("utf-8").strip())
            list_of_cells.append(text)
        list_of_rows.append(list_of_cells)


outfile = open("./tennis.csv", "wb")
writer = csv.writer(outfile)
writer.writerow(["Name", "Stat"])
writer.writerows(list_of_rows)

1 个答案:

答案 0 :(得分:2)

正如评论中所提到的,您每次都在重新创建list_of_rows。要修复它,你必须将它移到for循环之外,而不是附加到它,并将其转换为列表列表,扩展它。

另外,您的代码还有其他一些问题:

  • zip是多余的,它实际上最终会将您的名称转换为元组,这将导致格式错误,您只想迭代PLAYER_NAME,而当您在它时,可能会重命名那到PLAYER_NAMES(因为它是一个名单)
  • 当您尝试格式化字符串时,您只需要空括号,您需要一个数字来指定format中参数的位置 - 在本例中为{0}


PLAYER_NAMES = ["andy-murray/mc10", "rafael-nadal/n409"]
URL_PATTERN = 'http://www.atpworldtour.com/en/players/{0}/player-stats?year=0&surfaceType=clay'
list_of_rows = []
for item in PLAYER_NAMES:
    url = URL_PATTERN.format(item)

    response = requests.get(url)
    html = response.content
    soup = BeautifulSoup(html)
    table = soup.find('div', attrs={'class': 'mega-table-wrapper'})

    # for row in table.findAll('tr'):
    #     list_of_cells = []
    #     for cell in row.findAll('td'):
    #         text = (cell.text.encode("utf-8").strip())
    #         list_of_cells.append(text)
    #     list_of_rows.extend(list_of_cells) # Change to extend here

    # Incidentally, the for loop above could also be written as:
    list_of_rows += [
        [cell.text.encode("utf-8").strip() for cell in row.findAll('td')]
        for row in table.findAll('tr')
    ]