用Python绘制时间频率

时间:2018-01-21 01:58:08

标签: python csv matplotlib plot

我在csv文件中有180,000行,第三列(Time)看起来像2016-10-20 03:43:11+00:00(时间是UTC)。那么我如何在Python中绘制一个图表,可以显示在整个csv文件的2小时时间范围的每5分钟间隔中发生了多少这些行(推文)?例如,我有兴趣知道每5分钟间隔发生了多少推文。

CSV文件中的一些示例行如下所示:

Candidate,ID,Time,Username,Tweet
Clinton,788948653016842240,2016-10-20 03:43:11+00:00,Tamayo_castle,Lorem ipsum dolor sit amet, consectetur adipiscing elit 
Clinton,788948666501464064,2016-10-20 03:43:14+00:00,ThinkCenter1968,Maecenas congue, sem nec suscipit aliquam, lorem enim pl
Clinton,788948673594097664,2016-10-20 03:43:16+00:00,21stCenRevolt,Curabitur nec condimentum lorem. Aliquam a dolor porta
Both,788948662881751040,2016-10-20 03:43:13+00:00,mikeywan,Ut eu sagittis metus. Phasellus ut vulputate dui, nec malesuada 
Both,788948675313696769,2016-10-20 03:43:16+00:00,erwoti,Fusce sit amet aliquet ipsum, quis placerat elit. 
Clinton,788948671756955650,2016-10-20 03:43:15+00:00,isaac_urner,te nisi, vitae bibendum odio. Maecenas hen

基本上,我不确定如何将pd.date_range链接到tweets_df,以便它可以在两小时内以5分钟的间隔显示推文的频率(比如直方图格式或任何其他代表性的情节)。 / p>

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

tweets_df = pd.read_csv('valid_tweets.csv')
print(tweets_df)
pd.date_range('10/20/2016 1:55', '10/20/2016 3:55',
              freq='5 min', tz='UTC')

1 个答案:

答案 0 :(得分:1)

所以我会用pandas> = 0.19:

来做这件事
import pandas
import matplotlib.pyplot as plt

FIVEMIN = pandas.offsets.Minute(5)

fig, ax = plt.subplots(figsize=(6, 3.5))

ax = (
    pandas.read_csv('data.csv', parse_dates=['Time'])
          .resample(FIVEMIN, on='Time')['ID']
          .count()
          .plot.line(ax=ax) 
)
plt.show()

如果您没有使用0.19或更高版本的pandas,则需要明确设置索引:

ax = (
    pandas.read_csv('data.csv', parse_dates=['Time'])
          .set_index('Time') 
          .resample(FIVEMIN)['ID']
          .count()
          .plot.line(ax=ax) 
)