使用matplotlib / pandas / python,我无法将数据可视化为每30分钟和每天的值

时间:2015-07-06 11:46:32

标签: python csv pandas matplotlib

我正在使用Matplotlib / Python分析CSV文件。

这是CSV文件。 https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv

导入CSV文件,我使用以下代码成功绘制了一张图表并显示每30分钟的能耗。(谢谢大家!! Using Matplotlib, visualize CSV data

SELECT 
football_league.code, football_matches.id,
football_league.league, football_league.country, 
football_matches.date, football_matches.time as match_time, 
football_matches.team_1, ft_1.name as name_a, 
ft_1.shortname as short_a,
football_matches.team_2, ft_2.name as name_b, 
ft_2.shortname as short_b, 
football_matches.minutes, football_matches.status, football_matches.remarks,
(SELECT sum(goals) FROM football_goals WHERE `match` = football_matches.id AND `team` = ft_1.id) as 'score1',
(SELECT sum(goals) FROM football_goals WHERE `match` = football_matches.id AND `team` = ft_2.id) as 'score2'

FROM football_matches
INNER JOIN football_teams as ft_1 
ON ft_1.id = football_matches.team_1
INNER JOIN football_teams as ft_2 
ON ft_2.id = football_matches.team_2

INNER JOIN football_league 
ON football_league.code = football_matches.pcode

LEFT JOIN football_goals as fg_1 
ON `fg_1`.`match` = football_matches.id AND `fg_1`.`team` = ft_1.id

LEFT JOIN football_goals as fg_2 
ON `fg_2`.`match` = football_matches.id  AND `fg_2`.`team` = ft_2.id

WHERE 
football_league.code = 'L001' 
AND
football_matches.date = '2015-07-06'
GROUP BY football_matches.id

但问题是,我无法想象每天的能耗......

------------编辑(谢谢Florian !!)------------

我安装了pandas并为我的代码添加了pandas代码。

现在,我的代码如下所示;

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in csvreader:
        if len(row) ==2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

当我实现此代码时。我没有错。 但在我的图表中,没有任何内容被绘制出来......我怎么能解决它??

1 个答案:

答案 0 :(得分:3)

使用pandasresample功能可以让您的生活更轻松。

数据

import io
import pandas as pd
content = '''timestamp  value
2011-04-18 16:52:00     152.684299188514
2011-04-18 17:22:00     327.579073188405
2011-04-18 17:52:00     156.826945856169
2011-04-18 18:22:00     330.202764488018
2011-04-18 18:52:00     1118.60404324133
2011-04-18 19:22:00     243.972250782998
2011-04-18 19:52:00     852.88815851216
2011-04-18 20:22:00     491.859992982456
2011-04-18 20:52:00     466.738983617709
2011-04-18 21:22:00     659.670303375527
2011-04-18 21:52:00     576.304871428571
2011-04-18 22:22:00     2497.20620579196
2011-04-18 22:52:00     2790.20392088608
2011-04-18 23:22:00     1092.20906629318
2011-04-18 23:52:00     825.994417375886
2011-04-19 00:22:00     2397.16672089666
2011-04-19 00:52:00     1411.66659265233
2011-04-19 01:22:00     2379.18391111111
2011-04-19 01:52:00     841.224212511672
2011-04-19 02:22:00     471.520308479532
2011-04-19 02:52:00     1189.78122544232
2011-04-19 03:22:00     343.7574197609
2011-04-19 03:52:00     336.486834795322
2011-04-19 04:22:00     541.401434220355
2011-04-19 04:52:00     316.106452883263
2011-04-19 05:22:00     502.502274561404
2011-04-19 05:52:00     314.832323976608
'''

df = pd.read_table(io.BytesIO(content.encode('UTF-8')), sep='\s{2,}', parse_dates=[0], index_col=[0], engine='python')

使用重新采样功能

请参阅此处的文档:http://pandas-docs.github.io/pandas-docs-travis/

每30分钟

df = df.resample('30min', how='sum')
Out[496]: 
                           value
timestamp                       
2011-04-18 16:30:00   152.684299
2011-04-18 17:00:00   327.579073
2011-04-18 17:30:00   156.826946
2011-04-18 18:00:00   330.202764
2011-04-18 18:30:00  1118.604043
2011-04-18 19:00:00   243.972251
2011-04-18 19:30:00   852.888159
2011-04-18 20:00:00   491.859993
2011-04-18 20:30:00   466.738984
2011-04-18 21:00:00   659.670303
2011-04-18 21:30:00   576.304871
2011-04-18 22:00:00  2497.206206
2011-04-18 22:30:00  2790.203921
2011-04-18 23:00:00  1092.209066
2011-04-18 23:30:00   825.994417
2011-04-19 00:00:00  2397.166721
2011-04-19 00:30:00  1411.666593
2011-04-19 01:00:00  2379.183911
2011-04-19 01:30:00   841.224213
2011-04-19 02:00:00   471.520308
2011-04-19 02:30:00  1189.781225
2011-04-19 03:00:00   343.757420
2011-04-19 03:30:00   336.486835
2011-04-19 04:00:00   541.401434
2011-04-19 04:30:00   316.106453
2011-04-19 05:00:00   502.502275
2011-04-19 05:30:00   314.832324

每天

df = df.resample('1D', how='sum')
Out[497]: 
                   value
timestamp               
2011-04-18  12582.945297
2011-04-19  11045.629711

剧情

Per 30 minutes

希望它有所帮助!