Python CSV家庭作业计划

时间:2012-04-05 21:55:52

标签: python csv

我要完成一项家庭作业,与通过csv和函数读取文件有关。

基本思路是计算两年内足球运动员的冲击等级。我们使用提供给我们的文件中的数据。示例文件将是:

name, ,pos,team,g,rush,ryds,rtd,rtdr,ravg,fum,fuml,fpts,year 
A.J.,Feeley,QB,STL,5,3,4,0,0,1.3,3,2,20.3,2011
Aaron,Brown,RB,DET,1,1,0,0,0,0,0,0,0.9,2011
Aaron,Rodgers,QB,GB,15,60,257,3,5,4.3,4,0,403.4,2011
Adrian,Peterson,RB,MIN,12,208,970,12,5.8,4.7,1,0,188.9,2011
Ahmad,Bradshaw,RB,NYG,12,171,659,9,5.3,3.9,1,1,156.6,2011

换句话说,我们必须从文件中删除第一行,然后读取其余行,用逗号分隔。

要计算rusher评级,我们需要:

  

Yds是每次尝试的平均码数增益。这是[总码数/(4.05 *尝试次数)]。如果此数字大于2.375,则应使用2.375。

     

perTDs是每次随身携带的触地百分比。这是[(39.5 *达阵)/尝试]。如果此数字大于2.375,则应使用2.375 insted。

     

perFumbles是每次携带失误的百分比。这是[2.375 - ((21.5 * fumbles)/尝试)]。

     

冲击者评级是[Yds + perTDs + perFumbles] *(100 / 4.5)。

到目前为止我的代码:

playerinfo = []
teaminfo10 = []
teaminfo11 = []

import csv

file = raw_input("Enter filename: ")
read = open(file,"rU")
read.readline()
fileread = csv.reader(read)

#Each line is iterated through, and if rush attempts are greater than 10, the
#player may be used for further statistics.
for playerData in fileread:
    if int(playerData[5]) > 10:

        attempts = int(playerData[5])
        totalYards = int(playerData[6])
        touchdowns = int(playerData[7])
        fumbles = int(playerData[10])

        #Rusher rating for each player is found. This rating, coupled with other
        #data about the player is formatted and appended into a list of players.
        rushRating = ratingCalc(attempts,totalYards,touchdowns,fumbles)
        rusherData = rushFunc(playerData,rushRating)
        playerinfo.append(rusherData)

        #Different data about the player is formatted and added to one of two
        #lists of teams, based on year. 
        teamData = teamFunc(playerData)
        if playerData[13] == '2010':
            teaminfo10.append(teamData)
        else:
            teaminfo11.append(teamData)

#The list of players is sorted in order of decreasing rusher rating.
playerinfo.sort(reverse = True)
#The two team lists of players are sorted by team.
teaminfo10.sort()
teaminfo11.sort()

print "The following statistics are only for the years 2010 and 2011."
print "Only those rushers who have rushed more than 10 times are included."
print
print "The top 50 rushers based on their rusher rating in individual years are:"

#50 players, in order of decreasing rusher ratings, are printed along with other
#data.
rushPrint(playerinfo,50)

#A similar list of running backs is created, in order of decreasing rusher
#ratings.
RBlist = []
for player in playerinfo:
    if player[2] == 'RB':
        RBlist.append(player)

print "\nThe top 20 running backs based on their rusher rating in individual\
years are:"
#The top 20 running backs on the RBlist are printed, with other data.
rushPrint(RBlist,20)


#The teams with the greatest overall rusher rating (if their attempts are
#greater than 10) are listed in order of decreasing rusher rating, for both 2010
#and 2011.
teamListFunc(teaminfo10,'2010')

teamListFunc(teaminfo11,'2011')

#The player(s) with the most yardage is printed.
yardsList = mostStat(6,fObj,False)
print "\nThe people who rushed for the most yardage are:"
for item in yardsList:
    print "%s rushing for %d yards for %s in %s."\
    % (item[1],item[0],item[2],item[3])

#The player(s) with the most touchdowns is printed.
TDlist = mostStat(7,fObj,False)
print"\nThe people who have scored the most rushing touchdowns are:"
for item in TDlist:
    print "%s rushing for %d touchdowns for %s in %s."\
    % (item[1],item[0],item[2],item[3])

#The player(s) with the most yardage per rushing attempt is printed.
ypaList = mostStat(6,fObj,True)
print"\nThe people who have the highest yards per rushing attempt with over 10\
rushes are:"
for item in ypaList:
    print "%s with a %.2f yards per attempt rushing average for %s in %s."\
    % (item[1],item[0],item[2],item[3])

#The player(s) with the most fumbles is printed.
fmblList = mostStat(10,fObj,False)
print"\nThere are %d people with the most fumbles. They are:" % (len(fmblList))
for item in fmblList:
    print "%s with %d fumbles for %s in %s." % (item[1],item[0],item[2],item[3])


def ratingCalc(atts,totalYrds,TDs,fmbls):
    """Calculates rusher rating."""
    yrds = totalYrds / (4.05 * atts)
    if yrds > 2.375:
        yrds = 2.375

    perTDs = 39.5 * TDs / atts
    if perTDs > 2.375:
        perTDs = 2.375

    perFumbles = 2.375 - (21.5 * fmbls / atts)

    rating = (yrds + perTDs + perFumbles) * (100/4.5)

    return rating    

def rushFunc(information,rRating):
    """Formats player info into [rating,name,pos,team,yr,atts]"""
    rusherInfo = []
    rusherInfo.append(rRating)
    name = information[0] + ' ' + information[1]
    rusherInfo.append(name)
    rusherInfo.append(information[2])
    rusherInfo.append(information[3])
    rusherInfo.append(information[13])
    rusherInfo.append(information[5])

    return rusherInfo


def teamFunc(plyrInfo):
    """Formats player info into [team,atts,yrds,TDs,fmbls] for team sorting"""
    teamInfo = []
    teamInfo.append(plyrInfo[3])
    teamInfo.append(plyrInfo[5])
    teamInfo.append(plyrInfo[6])
    teamInfo.append(plyrInfo[7])
    teamInfo.append(plyrInfo[10])

    return teamInfo

def rushPrint(lst,num):
    """Prints players and their data in order of rusher rating."""
    print "Name                           Pos   Year  Attempts   Rating  Team"
    count = 0
    while count < num:
        index = lst[count]
        print "%-30s %-5s %4s  %5s      %3.2f  %s"\
              % (index[1],index[2],index[4],index[5],index[0],index[3])
        count += 1

所以是的,我还有很多功能需要定义。但是到目前为止你对代码有什么看法?它效率低下吗?你能告诉我它有什么问题吗?因为在我看来,这段代码将会非常长(比如300行左右),但老师说它应该是一个相对较短的项目。

1 个答案:

答案 0 :(得分:3)

这是一段代码,可以大大简化整个项目。

理解手头的任务可能需要一点点,但总的来说,当你处理正确的数据类型和'关联数组'(dicts)时,这将使你的生活更轻松

import csv

reader = csv.DictReader(open('mycsv.txt', 'r'))
#opens the csv file into a dictionary

list_of_players = map(dict, reader)
#puts all the dictionaries (by row) as a separate element in a list. 
#this way, its not a one-time iterator and all your info is easily accessible

for i in list_of_players:
    for stat in ['rush','ryds','rtd','fum','fuml','year']:
        i[stat] = int(i[stat])
    #the above loop makes all the intended integers..integers instead of strings
    for stat in ['fpts','ravg','rtdr']:
        i[stat] = float(i[stat])
    #the above loop makes all the intended floats..floats instead of strings

for i in list_of_players:
    print i['name'], i[' '], i['fpts']
    #now you can easily access and loop through your players with meaningful names
    #using 'fpts' rather than predetermined numbers [5]

此示例代码显示使用其名称及其统计信息(即名字,姓氏和fpts)是多么容易:

>>> 
A.J. Feeley 20.3
Aaron Brown 0.9
Aaron Rodgers 403.4
Adrian Peterson 188.9
Ahmad Bradshaw 156.6

当然,需要进行一些调整以获取所有请求的统计数据(最大值等),但是这样可以通过从一开始就保持数据类型正确来减少这些任务的负担。

这个赋值现在可以在很多,少于300行中完成(使用这些结构),你使用python越多,你就会学习完成它们的传统习语。 lambda和sorted()是你及时爱上的函数的例子!