在python中格式化CSV文件中的数据(计算平均值)

时间:2016-03-14 14:20:29

标签: csv python-3.x

import csv
with open('Class1scores.csv') as inf:
    for line in inf:
        parts = line.split() 
        if len(parts) > 1:   
            print (parts[4])   


f = open('Class1scores.csv')
csv_f = csv.reader(f)
newlist = []
for row in csv_f:

    row[1] = int(row[1])
    row[2] = int(row[2])
    row[3] = int(row[3])

    maximum = max(row[1:3])
    row.append(maximum)
    average = round(sum(row[1:3])/3)
    row.append(average)
    newlist.append(row[0:4])

averageScore = [[x[3], x[0]] for x in newlist]
print('\nStudents Average Scores From Highest to Lowest\n')

此处代码用于读取CSV文件,在前三行(第0行是用户名)中,它应该添加所有三个分数并除以3,但它不能计算出合适的平均值,它只取最后一栏的分数。

csv file

2 个答案:

答案 0 :(得分:3)

基本上你想要每行的统计数据。一般来说,你应该这样做:

import csv

with open('data.csv', 'r') as f:
    rows = csv.reader(f)
    for row in rows:
        name = row[0]
        scores = row[1:]

        # calculate statistics of scores
        attributes = {
           'NAME': name,
           'MAX' : max(scores),
           'MIN' : min(scores),
           'AVE' : 1.0 * sum(scores) / len(scores)
        }

        output_mesg ="name: {NAME:s} \t high: {MAX:d} \t low: {MIN:d} \t ave: {AVE:f}"
        print(output_mesg.format(**attributes))

尽量不要考虑在本地做特定事情是否效率低下。一个好的Pythonic脚本应尽可能对每个脚本都可读。

在你的代码中,我发现了两个错误:

  1. 附加到row不会改变任何内容,因为row是for循环中的局部变量,会收集垃圾。

  2. row[1:3]仅提供第二个和第三个元素。 row[1:4]提供您想要的内容,以及row[1:]Python中的索引通常是最终排他的。

  3. 还有一些问题供您考虑:

      

    如果我可以在Excel中打开该文件并且它不是那么大,为什么不在Excel中执行此操作?我是否可以尽可能快地利用我所拥有的所有工具来尽快完成工作?我可以在30秒内完成这项任务吗?

答案 1 :(得分:2)

这是一种方法。看两个部分。首先,我们创建一个字典,其中名称为键,结果列表为值。

import csv


fileLineList = []
averageScoreDict = {}

with open('Class1scores.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for row in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
    averageScoreDict[row[0]] = [highest, lowest, round(average)]

print(averageScoreDict)

输出:

{'Milky': [7, 4, 5], 'Billy': [6, 5, 6], 'Adam': [5, 2, 4], 'John': [10, 7, 9]}

现在我们有了字典,我们可以通过对列表进行排序来创建所需的最终输出。请参阅此更新的代码:

import csv
from operator import itemgetter


fileLineList = []
averageScoreDict = {} # Creating an empty dictionary here.

with open('Class1scores.csv', newline='') as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        fileLineList.append(row)

for row in fileLineList:
    highest = 0
    lowest = 0
    total = 0
    average = 0
    for column in row:
        if column.isdigit():
            column = int(column)
            if column > highest:
                highest = column
            if column < lowest or lowest == 0:
                lowest = column
            total += column    
    average = total / 3
    # Here is where we put the emtpy dictinary created earlier to good use.
    # We assign the key, in this case the contents of the first column of
    # the CSV, to the list of values. 
    # For the first line of the file, the Key would be 'John'.
    # We are assigning a list to John which is 3 integers: 
    #   highest, lowest and average (which is a float we round)
    averageScoreDict[row[0]] = [highest, lowest, round(average)]

averageScoreList = []

# Here we "unpack" the dictionary we have created and create a list of Keys.
# which are the names and single value we want, in this case the average.
for key, value in averageScoreDict.items():
    averageScoreList.append([key, value[2]])

# Sorting the list using the value instead of the name.
averageScoreList.sort(key=itemgetter(1), reverse=True)    

print('\nStudents Average Scores From Highest to Lowest\n')
print(averageScoreList)

输出:

Students Average Scores From Highest to Lowest [['John', 9], ['Billy', 6], ['Milky', 5], ['Adam', 4]]