Question

我有一个如下所示的CSV文件：

spend

它是特定日期的温度列表。它包含数年的数据，因此相同的日期会多次出现。我想平均温度，以便我得到一个新表，其中每个日期仅发生一次，并且在第二列中具有该日期的平均温度。

我知道Stack Overflow要求您包含您尝试过的内容，但我真的不知道如何做到这一点，并且无法找到任何其他答案。

我希望有人可以提供帮助。非常感谢任何帮助。

Answer 1

当df是您的数据框时，您可以使用pandas并运行groupby命令：

df.groupby('DATE').mean()

以下是描述行为的一些玩具示例

import pandas as pd
df=pd.DataFrame({"a":[1,2,3,1,2,3],"b":[1,2,3,4,5,6]})
df.groupby('a').mean()

将导致

当原始数据框

时

Answer 2

如果你可以使用集合中的defaultdict pacakge，那么这类事情就很容易了。

假设您的列表与python脚本位于同一目录中，它看起来像这样：

list.csv：

DATE,TEMP 0101,39.0 0102,40.9 0103,44.4 0104,41.0 0105,40.0 0106,42.2 0101,39.0 0102,40.9 0103,44.4 0104,41.0 0105,40.0 0106,42.2

以下是我用来打印平均值的代码。

#test.py
#usage: python test.py list.csv
import sys
from collections import defaultdict

#Open a file who is listed in the command line in the second position
with open(sys.argv[1]) as File:

    #Skip the first line of the file, if its just "data,value"
    File.next()

    #Create a dictionary of lists
    ourDict = defaultdict(list)

    #parse the file, line by line
    for each in File:
        # Split the file, by a comma,
        #or whatever separates them (Comma Seperated Values = CSV)
        each = each.split(',')

        # now each[0] is a year, and each[1] is a value.
        # We use each[0] as the key, and append vallues to the list
        ourDict[each[0]].append(float(each[1]))

    print "Date\tValue"
    for key,value in ourDict.items():
        # Average is the sum of the value of all members of the list
        # divided by the list's length
        print key,'\t',sum(value)/len(value)

Python - 唯一值的平均值

2 个答案: