Pandas groupby方法

时间:2016-09-10 13:14:38

标签: python pandas

我有一个41年的数据集,我想通过使用Pandas模块进行一些统计计算。但是,我缺乏熊猫知识。 这是一个示例csv文件数据集:

import numpy as np
import pandas as pd
import csv

filename="output813b.csv"
cols = ["date","year","month","day" ,"pcp1","pcp2","pcp3","pcp4","pcp5","pcp6"]
data1=pd.read_csv(filename,sep=',', header=None,names=cols,usecols=range(1,9))
colmns_needed=["month" ,"pcp1","pcp2","pcp3","pcp4","pcp5","pcp6"]
data2=pd.read_csv(filename,sep=',', header=None,names=colmns_needed)
mm=data2.groupby("month")
print(mm.sum())
print('\n')

这是我的代码:

pcp1

但PCP列下的值似乎存储为字符串。 这是Month pcp1 1 0.4310.4720000.91800000.01011.63904.65900.5780... 10 00.1500000000.027000.02400.1630.9610000000.017... 11 00.4940000000000.0480.003012.26200000003.612.9... 12 0.1890.0760.47000000000.08800.1080.26107.15000... 13 00.06500.1060.00700000050.6207.1510.0860.1487.... 14 0000.64200000000.017025.5910.93400.04500000000... 15 0.742000.0720000000000.32500000000002.9877.512... 16 6.43900000000000.38103.986000000000033.5534.76... 17 0.0890000.2750000.555001.9230.562.9130.1360000... 18 3.28200000000.024000.656002.1750000000008.2434... 19 1.28200000000000000.0070000000007.0383.0450.17... 2 1.2160.1050000000010.4690.2092.9700.0415.6062.... 20 00.4960.05100000000000.3550.1582.8530.04600000... 21 00000000000002.69903.5190.13000002.830.5151.09... 22 0000000007.19600000000000001.4421.76500.04500.... 23 0000000008.168000.02100000000000.1083.8760.968... 的示例输出:

Time
1
1
2
2.6
2
2
8.81
3.01
3
5.56
1.6
6.6

怎样才能解决这个问题?

1 个答案:

答案 0 :(得分:2)

请勿在{{1​​}}来电中指定header=None。您告诉函数数据中没有标题行,根据您在上面发布的示例数据,文件的第一行是标题。因此,它将第一个标题行视为数据,从而混合read_csvpcp1之类的值,并将所有列都解释为字符串。