从包含特定字符的CSV列中查找平均值

时间:2015-09-21 17:11:17

标签: python csv sum average

我正在尝试获取一个简单的python函数,该函数将读取CSV文件并找到列和行的平均值。 该函数将检查第一行和每个标题的列 从字母“Q”开始,它将计算出的平均值 该列然后将其打印到屏幕上。然后为每一行 数据将计算学生列中所有项目的平均值 以'Q'开头。它会正常地计算这个平均值 最低的测验下降。它将为每个学生打印出两个值。

CSV文件包含学生的成绩,如下所示:

       hw1   hw2    Quiz3 hw4   Quiz2   Quiz1
john    87    98    76    67    90      56
marie   45    67    65    98    78      67
paul    54    64    93    28    83      98
fred    67    87    45    98    56      87
到目前为止我的代码是这样但我不知道如何继续:

import csv


def practice():
newlist=[]
afile= input('enter file name')
a = open(afile, 'r')
reader = csv.reader(a, delimiter = ",")


for each in reader:
    newlist.append(each)
y=sum(int(x[2] for x in reader))
print (y)

filtered = []
total = 0

for i in range (0,len(newlist)):
    if 'Q' in [i][1]:
        filtered.append(newlist[i])
return filtered

2 个答案:

答案 0 :(得分:1)

我可以建议使用熊猫:

>>> import pandas as pd
>>> data = pd.read_csv('file.csv', sep=' *')
>>> q_columns = [name for name in data.columns if name.startswith('Q')]

>>> reduced_data = data[q_columns].copy()
>>> reduced_data.mean()
Quiz3    69.75
Quiz2    76.75
Quiz1    77.00
dtype: float64

>>> reduced_data.mean(axis=1)
john     74.000000
marie    70.000000
paul     91.333333
fred     62.666667
dtype: float64

>>> import numpy as np
>>> for index, column in reduced_data.idxmin(axis=1).iteritems():
...     reduced_data.ix[index, column] = np.nan
>>> reduced_data.mean(axis=1)
john     83.0
marie    72.5
paul     95.5
fred     71.5
dtype: float64

答案 1 :(得分:0)

如果您更改.csv格式,则会有更好的代码。然后我们可以轻松使用DictReader

grades.csv:

name,hw1,hw2,Quiz3,hw4,Quiz2,Quiz1
john,87,98,76,67,90,56
marie,45,67,65,98,78,67
paul,54,64,93,28,83,98
fred,67,87,45,98,56,87

代码:

import numpy as np
from collections import defaultdict
import csv

result = defaultdict( list )
with open('grades.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile) 
    for row in reader:
        for k in row:
            if k.startswith('Q'):
                result[ row['name'] ].append( int(row[k]) )
for name, lst in result.items():
    print name, np.mean( sorted(lst)[1:] )

输出:

paul 95.5
john 83.0
marie 72.5
fred 71.5