按月对csv文件中的数据进行分组

时间:2014-05-14 06:20:08

标签: python csv

我正在尝试创建一个简单的界面,允许用户选择一个csv文件,然后按月显示数据。到目前为止这是我出来的代码

import math
import numpy as np
import matplotlib.pyplot as plt
from pylab import *
from matplotlib.font_manager import FontProperties
from tkinter import *
from tkinter import messagebox
from matplotlib.backends.backend_tkagg import FigureCanvasTkAgg,NavigationToolbar2TkAgg
import datetime
import random
from dateutil.relativedelta import relativedelta
from itertools import accumulate

root = Tk()
f = Frame(root)
fig=plt.figure(1)
fig2=plt.figure(2)
kit=[]
lau=[]
air=[]
oth=[]
mon=[]
accu=[]
tot=[]

def readCSV():

    del kit[:],lau[:],air[:],oth[:],mon[:],tot[:],accu[:]    

    filename =filedialog.askopenfilename()
    try:
        f = open(filename,"r")
        data=plt.mlab.csv2rec(f,delimiter =",")
        for row in data:
            mon.append(row[0])
            kit.append(row[1])
            lau.append(row[2])
            air.append(row[3])
            oth.append(row[4])
            accu.append(row[5])
            tot.append(row[6])




    except (IOError,UnboundLocalError,IndexError):
        messagebox.showinfo( "Error", "Invalid or file not found")   

def graph():
    #Original Chart
    plt.figure(1)
    plt.clf()
    kitchen=np.array(kit)
    laundry=np.array(lau)
    aircon=np.array(air)
    other=np.array(oth)
    ind=np.arange(11)+0.75
    width=0.75


    p1=plt.bar(ind,kitchen,width,color="cyan")
    p2=plt.bar(ind,laundry,width,color="purple",bottom=kitchen)
    p3=plt.bar(ind,aircon,width,color="green",bottom=kitchen+laundry)
    p4=plt.bar(ind,other,width,color="red",bottom=kitchen+laundry+aircon)


    plt.ylabel("KWH")
    plt.ylim(0,1200)
    datee=[]
    for dt in mon:
        datee.append(dt.strftime("%b/%y"))    
    plt.xticks(ind+width/2,datee,rotation=70)
    fontP = FontProperties()
    fontP.set_size('small')
    plt.title('Actual Monthly Consumption')
    plt.tight_layout()
    plt.legend((p1[0],p2[0],p3[0],p4[0]),('kitchen','laundry','aircon&heater','other'),'best',prop=fontP)
    canvas = FigureCanvasTkAgg(fig, master=c)
    canvas.show()
    toolbar = NavigationToolbar2TkAgg(canvas, root)
    canvas.get_tk_widget().grid(row=0,column=1)
    toolbar.grid(row=2,column=0)
f.grid(row=0,column=0)
#place buttons on the *frame*
var = StringVar()
label = Label( f, textvariable=var,bg="pink")
var.set("Select Database\n(Csv file)")
label.grid(row=0,column=0,ipadx=1,sticky=W)
label.config(height=2,width=15)

b1 =Button(f, text ="Csv\nselector", command = readCSV,bg='cyan')
b1.grid(row=1,column=0,ipadx=1,padx=5)


b2 =Button(f,text="Summary",command=graph,bg='orange')

b2.grid(row=1,column=1,ipadx=1,padx=5)
width =350
height=250
c = Canvas(root, width=width, height=height, bg='gray')


c.grid(row=1,column=0) 

到目前为止,接口使用按月排序的csv文件

Date,Kitchen,Laundry,Aircon&heater,Others,Accumulative,Total
January/2010,53.887,56.568,395.913,483.293,989.661,989.661
February/2010,49.268,53.590,411.714,409.956,1914.1894,924.528
March/2010,35.089,60.872,324.352,382.285,2716.7877,802.598
April/2010,38.196,36.476,336.091,328.872,3456.4231,739.635
May/2010,48.107,52.376,364.625,349.765,4271.296433,814.873
June/2010,65.747,47.675,306.934,277.734,4969.386833,698.090
July/2010,17.667,34.359,192.912,291.525,5505.849367,536.463
August/2010,12.499,26.983,160.189,168.719,5874.238933,368.390
September/2010,36.865,32.508,257.861,277.923,6479.396,605.157
October/2010,48.199,60.220,315.669,441.461,7344.945233,865.549
November/2010,45.082,41.897,237.124,394.402,8063.449967,718.505

但我被告知文件将是这种格式

Date,Total (Kwh),Kitchen_Accumulative(KWh),LaundaryRoom_Accumulative(KWh),Air-Con_Accumulative (KWh),Others (KWh)
1/1/2007,45.817,0.000,0.352,5.880,39.585
2/1/2007,21.154,0.000,0.348,6.562,14.244
3/1/2007,16.901,0.000,0.344,4.765,11.792
4/1/2007,54.324,1.051,7.597,10.896,34.780
5/1/2007,45.223,1.483,0.379,7.602,35.759
6/1/2007,25.140,1.336,0.402,5.678,17.724
7/1/2007,40.794,1.987,8.177,12.810,17.820
8/1/2007,37.356,0.000,0.467,17.547,19.342
9/1/2007,31.151,1.688,4.267,9.790,15.406
10/1/2007,35.913,0.771,4.456,11.012,19.674
11/1/2007,37.587,1.378,2.170,12.415,21.624
12/1/2007,24.355,0.000,0.439,8.276,15.640
13/1/2007,53.114,7.806,2.975,11.341,30.992
14/1/2007,50.130,1.777,4.215,12.975,31.163
15/1/2007,35.811,1.099,2.239,15.163,17.310
16/1/2007,28.107,2.063,0.644,6.583,18.817 
...

一直到2007年12月31日。

因此,用户可以选择按月(如我的代码)或天(单月)使用标有"月"的额外2个按钮来显示图表。和"天"。

我的问题是,我如何将数据按月添加到列表中,以便我可以将其添加并显示12个月图表或单个月(30天)?我完全不知道如何读取某些行的csv文件。

第一个文件很简单,因为我所要做的就是循环并追加它们,但它显然不适用于第二个文件

2 个答案:

答案 0 :(得分:3)

您可能需要的东西:

粗伪代码:

  1. 将您的csv数据读入列表
  2. 使用密钥(您的日期)使用itertools.groupby将该列表转换为群组
  3. 您想要对数据做任何其他事情......
  4. 其他有用的资源:

    更新:使用itertools.groupby()的示例:

    >>> from itertools import groupby
    >>> xs = [
    ...     (("1", "Sep", "2013"), 123.4),
    ...     (("15", "Sep", "2013"), 234.0),
    ...     (("1", "Oct", "2014"), 456.0),
    ...     (("15", "Oct", "2014"), 778.0),
    ... ]
    >>> group_by_month = lambda x: x[0][1]
    >>> groups = groupby(xs, group_by_month)
    >>> key, group  = next(groups)
    >>> key, list(group)
    ('Sep', [(('1', 'Sep', '2013'), 123.4), (('15', 'Sep', '2013'), 234.0)])
    >>> key, group  = next(groups)
    >>> key, list(group)
    ('Oct', [(('1', 'Oct', '2014'), 456.0), (('15', 'Oct', '2014'), 778.0)])
    >>> key, group  = next(groups)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    StopIteration
    

答案 1 :(得分:0)

首先使用月份作为键创建字典,将空列表创建为如下值:

months = {'1' : [], '2': [], ..., '12': []}

然后逐行读取CSV文件(跳过第一行),用拆分(&#39;,&#39;)方法拆分该行 从字符串。然后从拆分列表的第一个元素中提取月份并添加其余部分 要在您的dictionaray中列出的值,如下所示:

months[month].extend(splited_list[1:])