我有一个如下所示的CSV文件:
spend
它是特定日期的温度列表。它包含数年的数据,因此相同的日期会多次出现。我想平均温度,以便我得到一个新表,其中每个日期仅发生一次,并且在第二列中具有该日期的平均温度。
我知道Stack Overflow要求您包含您尝试过的内容,但我真的不知道如何做到这一点,并且无法找到任何其他答案。
我希望有人可以提供帮助。非常感谢任何帮助。
答案 0 :(得分:4)
当df是您的数据框时,您可以使用pandas并运行groupby命令:
df.groupby('DATE').mean()
以下是描述行为的一些玩具示例
import pandas as pd
df=pd.DataFrame({"a":[1,2,3,1,2,3],"b":[1,2,3,4,5,6]})
df.groupby('a').mean()
将导致
a b
1 2.5
2 3.5
3 4.5
当原始数据框
时 a b
0 1 1
1 2 2
2 3 3
3 1 4
4 2 5
5 3 6
答案 1 :(得分:1)
如果你可以使用集合中的defaultdict pacakge,那么这类事情就很容易了。
假设您的列表与python脚本位于同一目录中,它看起来像这样:
list.csv:
DATE,TEMP
0101,39.0
0102,40.9
0103,44.4
0104,41.0
0105,40.0
0106,42.2
0101,39.0
0102,40.9
0103,44.4
0104,41.0
0105,40.0
0106,42.2
以下是我用来打印平均值的代码。
#test.py
#usage: python test.py list.csv
import sys
from collections import defaultdict
#Open a file who is listed in the command line in the second position
with open(sys.argv[1]) as File:
#Skip the first line of the file, if its just "data,value"
File.next()
#Create a dictionary of lists
ourDict = defaultdict(list)
#parse the file, line by line
for each in File:
# Split the file, by a comma,
#or whatever separates them (Comma Seperated Values = CSV)
each = each.split(',')
# now each[0] is a year, and each[1] is a value.
# We use each[0] as the key, and append vallues to the list
ourDict[each[0]].append(float(each[1]))
print "Date\tValue"
for key,value in ourDict.items():
# Average is the sum of the value of all members of the list
# divided by the list's length
print key,'\t',sum(value)/len(value)