Python - 唯一值的平均值

时间:2017-03-12 20:45:13

标签: python python-3.x csv

我有一个如下所示的CSV文件:

spend

它是特定日期的温度列表。它包含数年的数据,因此相同的日期会多次出现。我想平均温度,以便我得到一个新表,其中每个日期仅发生一次,并且在第二列中具有该日期的平均温度。

我知道Stack Overflow要求您包含您尝试过的内容,但我真的不知道如何做到这一点,并且无法找到任何其他答案。

我希望有人可以提供帮助。非常感谢任何帮助。

2 个答案:

答案 0 :(得分:4)

当df是您的数据框时,您可以使用pandas并运行groupby命令:

df.groupby('DATE').mean()

以下是描述行为的一些玩具示例

import pandas as pd
df=pd.DataFrame({"a":[1,2,3,1,2,3],"b":[1,2,3,4,5,6]})
df.groupby('a').mean()

将导致

a   b
1   2.5
2   3.5
3   4.5

当原始数据框

    a   b
0   1   1
1   2   2
2   3   3
3   1   4
4   2   5
5   3   6

答案 1 :(得分:1)

如果你可以使用集合中的defaultdict pacakge,那么这类事情就很容易了。

假设您的列表与python脚本位于同一目录中,它看起来像这样:

list.csv:

DATE,TEMP 0101,39.0 0102,40.9 0103,44.4 0104,41.0 0105,40.0 0106,42.2 0101,39.0 0102,40.9 0103,44.4 0104,41.0 0105,40.0 0106,42.2

以下是我用来打印平均值的代码。

#test.py
#usage: python test.py list.csv
import sys
from collections import defaultdict

#Open a file who is listed in the command line in the second position
with open(sys.argv[1]) as File:

    #Skip the first line of the file, if its just "data,value"
    File.next()

    #Create a dictionary of lists
    ourDict = defaultdict(list)

    #parse the file, line by line
    for each in File:
        # Split the file, by a comma,
        #or whatever separates them (Comma Seperated Values = CSV)
        each = each.split(',')

        # now each[0] is a year, and each[1] is a value.
        # We use each[0] as the key, and append vallues to the list
        ourDict[each[0]].append(float(each[1]))

    print "Date\tValue"
    for key,value in ourDict.items():
        # Average is the sum of the value of all members of the list
        # divided by the list's length
        print key,'\t',sum(value)/len(value)