使用分组依据汇总时间序列,并使用多个序列创建图表

时间:2018-12-21 12:49:51

标签: python-3.x dataframe charts

我有时间序列数据,我想创建一个按月(x轴)记录数计数的图表(折线图),按情感(多行)分组

数据看起来像这样

#!/bin/bash
open -a Firefox $1

到目前为止,我已成功使用以下代码汇总到每月总计:

created_at                         id                   polarity  sentiment  
0  Fri Nov 02 11:22:47 +0000 2018  1058318498663870464  0.000000   neutral   
1  Fri Nov 02 11:20:54 +0000 2018  1058318026758598656  0.011905   neutral   
2  Fri Nov 02 09:41:37 +0000 2018  1058293038739607552  0.800000  positive   
3  Fri Nov 02 09:40:48 +0000 2018  1058292834699231233  0.800000  positive   
4  Thu Nov 01 18:23:17 +0000 2018  1058061933243518976  0.233333   neutral   
5  Thu Nov 01 17:50:39 +0000 2018  1058053723157618690  0.400000  positive   
6  Wed Oct 31 18:57:53 +0000 2018  1057708251758903296  0.566667  positive   
7  Sun Oct 28 17:21:24 +0000 2018  1056596810570100736  0.000000   neutral   
8  Sun Oct 21 13:00:53 +0000 2018  1053994531845296128  0.136364   neutral   
9  Sun Oct 21 12:55:12 +0000 2018  1053993101205868544  0.083333   neutral

我正在努力绘制数据图表。

  • 我是否会转过身来将情感中的每个谨慎值 字段作为单独的列
  • 我尝试过import pandas as pd tweets = process_twitter_json(file_name) #print(tweets[:10]) df = pd.DataFrame.from_records(tweets) print(df.head(10)) #make the string date into a date field df['tweet_datetime'] = pd.to_datetime(df['created_at']) df.index = df['tweet_datetime'] #print('Monthly counts') monthly_sentiment = df.groupby('sentiment')['tweet_datetime'].resample('M').count() 来将情感值转换为行, 几乎在那里,但是问题是日期变成了字符串列 标头,这对图表绘制没有好处

1 个答案:

答案 0 :(得分:0)

好吧,我更改了每月汇总方法,并使用了Grouper而不是重新采样,这意味着当我执行unstack()时,结果数据框是垂直的(深而狭窄),日期作为行,而不是水平的,日期作为列标题这意味着我不再需要在绘制图表时将日期存储为字符串的问题。

完整代码:

import pandas as pd

tweets = process_twitter_json(file_name) 

df = pd.DataFrame.from_records(tweets)


df['tweet_datetime'] = pd.to_datetime(df['created_at'])
df.index = df['tweet_datetime']

grouper = df.groupby(['sentiment', pd.Grouper(key='tweet_datetime', freq='M')]).id.count()
result = grouper.unstack('sentiment').fillna(0)

##=================================================
##PLOTLY - charts in Jupyter

from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

print (__version__)# requires version >= 1.9.0

import plotly.graph_objs as go

init_notebook_mode(connected=True)

trace0 = go.Scatter(
    x = result.index,
    y = result['positive'],
    name = 'Positive',
    line = dict(
        color = ('rgb(205, 12, 24)'),
        width = 4)
)

trace1 = go.Scatter(
    x = result.index,
    y = result['negative'],
    name = 'Negative',
    line = dict(
        color = ('rgb(22, 96, 167)'),
        width = 4)
)    
trace2 = go.Scatter(
    x = result.index,
    y = result['neutral'],
    name = 'Neutral',
    line = dict(
        color = ('rgb(12, 205, 24)'),
        width = 4)
)

data = [trace0, trace1, trace2]

iplot(data)