我正在使用Matplotlib / Python分析CSV文件。
导入CSV文件,我使用以下代码成功绘制了一张图表并显示每30分钟的能耗。(谢谢大家!! Using Matplotlib, visualize CSV data)
SELECT
football_league.code, football_matches.id,
football_league.league, football_league.country,
football_matches.date, football_matches.time as match_time,
football_matches.team_1, ft_1.name as name_a,
ft_1.shortname as short_a,
football_matches.team_2, ft_2.name as name_b,
ft_2.shortname as short_b,
football_matches.minutes, football_matches.status, football_matches.remarks,
(SELECT sum(goals) FROM football_goals WHERE `match` = football_matches.id AND `team` = ft_1.id) as 'score1',
(SELECT sum(goals) FROM football_goals WHERE `match` = football_matches.id AND `team` = ft_2.id) as 'score2'
FROM football_matches
INNER JOIN football_teams as ft_1
ON ft_1.id = football_matches.team_1
INNER JOIN football_teams as ft_2
ON ft_2.id = football_matches.team_2
INNER JOIN football_league
ON football_league.code = football_matches.pcode
LEFT JOIN football_goals as fg_1
ON `fg_1`.`match` = football_matches.id AND `fg_1`.`team` = ft_1.id
LEFT JOIN football_goals as fg_2
ON `fg_2`.`match` = football_matches.id AND `fg_2`.`team` = ft_2.id
WHERE
football_league.code = 'L001'
AND
football_matches.date = '2015-07-06'
GROUP BY football_matches.id
但问题是,我无法想象每天的能耗......
------------编辑(谢谢Florian !!)------------
我安装了pandas并为我的代码添加了pandas代码。
现在,我的代码如下所示;
from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
style.use('ggplot')
filename='total_watt.csv'
date=[]
number=[]
import csv
with open(filename, 'rb') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
for row in csvreader:
if len(row) ==2 :
date.append(row[0])
number.append(row[1])
number=np.array(number)
import datetime
for ii in range(len(date)):
date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')
plt.plot(date,number)
plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
当我实现此代码时。我没有错。 但在我的图表中,没有任何内容被绘制出来......我怎么能解决它??
答案 0 :(得分:3)
使用pandas
和resample
功能可以让您的生活更轻松。
import io
import pandas as pd
content = '''timestamp value
2011-04-18 16:52:00 152.684299188514
2011-04-18 17:22:00 327.579073188405
2011-04-18 17:52:00 156.826945856169
2011-04-18 18:22:00 330.202764488018
2011-04-18 18:52:00 1118.60404324133
2011-04-18 19:22:00 243.972250782998
2011-04-18 19:52:00 852.88815851216
2011-04-18 20:22:00 491.859992982456
2011-04-18 20:52:00 466.738983617709
2011-04-18 21:22:00 659.670303375527
2011-04-18 21:52:00 576.304871428571
2011-04-18 22:22:00 2497.20620579196
2011-04-18 22:52:00 2790.20392088608
2011-04-18 23:22:00 1092.20906629318
2011-04-18 23:52:00 825.994417375886
2011-04-19 00:22:00 2397.16672089666
2011-04-19 00:52:00 1411.66659265233
2011-04-19 01:22:00 2379.18391111111
2011-04-19 01:52:00 841.224212511672
2011-04-19 02:22:00 471.520308479532
2011-04-19 02:52:00 1189.78122544232
2011-04-19 03:22:00 343.7574197609
2011-04-19 03:52:00 336.486834795322
2011-04-19 04:22:00 541.401434220355
2011-04-19 04:52:00 316.106452883263
2011-04-19 05:22:00 502.502274561404
2011-04-19 05:52:00 314.832323976608
'''
df = pd.read_table(io.BytesIO(content.encode('UTF-8')), sep='\s{2,}', parse_dates=[0], index_col=[0], engine='python')
请参阅此处的文档:http://pandas-docs.github.io/pandas-docs-travis/
df = df.resample('30min', how='sum')
Out[496]:
value
timestamp
2011-04-18 16:30:00 152.684299
2011-04-18 17:00:00 327.579073
2011-04-18 17:30:00 156.826946
2011-04-18 18:00:00 330.202764
2011-04-18 18:30:00 1118.604043
2011-04-18 19:00:00 243.972251
2011-04-18 19:30:00 852.888159
2011-04-18 20:00:00 491.859993
2011-04-18 20:30:00 466.738984
2011-04-18 21:00:00 659.670303
2011-04-18 21:30:00 576.304871
2011-04-18 22:00:00 2497.206206
2011-04-18 22:30:00 2790.203921
2011-04-18 23:00:00 1092.209066
2011-04-18 23:30:00 825.994417
2011-04-19 00:00:00 2397.166721
2011-04-19 00:30:00 1411.666593
2011-04-19 01:00:00 2379.183911
2011-04-19 01:30:00 841.224213
2011-04-19 02:00:00 471.520308
2011-04-19 02:30:00 1189.781225
2011-04-19 03:00:00 343.757420
2011-04-19 03:30:00 336.486835
2011-04-19 04:00:00 541.401434
2011-04-19 04:30:00 316.106453
2011-04-19 05:00:00 502.502275
2011-04-19 05:30:00 314.832324
df = df.resample('1D', how='sum')
Out[497]:
value
timestamp
2011-04-18 12582.945297
2011-04-19 11045.629711
希望它有所帮助!