在数据框中使用分类变量,以沿散点图中的线对线图进行着色

时间:2019-02-04 17:36:07

标签: python time-series plotly

我正在计算每秒传感器数据上的"state/activity"变量(字符串)。有12个州,数据平均持续10-12天。我正在与"state/activity"一起构建具有传感器数据参数的每秒日志查看器。绘制如下例所示。我正在尝试通过"battle_deaths"变量的值来为"category"列着色。在绘图中有一个color属性,但是在我看到的所有示例中,它都采用了数值,我无法"map"的分类值变成彩色。请在下面查看当前输出和预期输出(输出中的透支部分)

#dataframe with time index
data = {
        'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:06.119994',
 '2014-05-01 18:47:07.178768', '2014-05-01 18:47:08.230071', 
'2014-05-01 18:47:09.230071', '2014-05-01 18:47:10.280592', 
'2014-05-01 18:47:11.332662', '2014-05-01 18:47:12.385109', 
'2014-05-01 18:47:13.436523', '2014-05-01 18:47:14.486877'], 
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41],
        'category' : ["A", "A","A","A","C","A","B","C","B","B"],
        'chicken_dinners':["4000", "5000", "6000", "-1000","4500", 
                            "5900", "6300", "6712","7788","4681"]
       }

df = pd.DataFrame(data, columns = ['date', 'battle_deaths', 'category', 'chicken_dinners'])
df['date'] = pd.to_datetime(df['date'])
df.index = df['date']
del df['date']

print(df)   

>     date  battle_deaths   category    
>     2014-05-01 18:47:05.069722    34  A
>     2014-05-01 18:47:06.119994    25  A
>     2014-05-01 18:47:07.178768    26  A
>     2014-05-01 18:47:08.230071    15  A
>     2014-05-01 18:47:09.230071    15  C
>     2014-05-01 18:47:10.280592    14  A
>     2014-05-01 18:47:11.332662    26  B
>     2014-05-01 18:47:12.385109    25  C
>     2014-05-01 18:47:13.436523    62  B
>     2014-05-01 18:47:14.486877    41  B



#plot code
random_x = df.index

traceC1 = go.Scattergl(
    x=random_x,
    y=df["battle_deaths"],
    mode='lines+ markers',
    name="battle_deaths ",
    hoverinfo='x'
)
traceC2 = go.Scattergl(
    x=random_x,
    y=df["chicken_dinners"],
    mode='lines',
    name="chicken_dinners",
    hoverinfo='y'
)  

#append traces to the above colored plot, no need to color other plots
fig_circ = tools.make_subplots(rows=2, cols=1, shared_xaxes=True)
fig_circ.append_trace(traceC1, 1, 1)
fig_circ.append_trace(traceC2, 2, 1)

#custom scales on different sensor data channels
#scaling is important and can't autoscale, because data has 'spikes' all over the place

fig_circ['layout'].update(
                            height=1000, width=1600,
                            margin = dict(l = 100, r =0, t=0, b= 0),
                            yaxis = dict(range = [0, 100],
                            yaxis2 = dict(range = [-50, 500])
                        )

plotly.offline.plot(fig_circ, filename='sample.html')

Current Output Hand Drawn Expected Output (keep only the colored line)

1 个答案:

答案 0 :(得分:1)

当前(Feb 2019)尚无简单/直接的方法。

一种可能的解决方案是:

  • 绘制具有不同颜色的多条迹线
  • 通过legendgroup
  • 对相同的颜色进行分组
  • 如果已经绘制了类别,则将showlegend设置为False

下面的代码可能会进行一些优化,但可以帮助您入门。 enter image description here

import pandas as pd
import plotly
plotly.offline.init_notebook_mode()

# taken from the original question
data = {
        'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:06.119994',
'2014-05-01 18:47:07.178768', '2014-05-01 18:47:08.230071', 
'2014-05-01 18:47:09.230071', '2014-05-01 18:47:10.280592', 
'2014-05-01 18:47:11.332662', '2014-05-01 18:47:12.385109', 
'2014-05-01 18:47:13.436523', '2014-05-01 18:47:14.486877'], 
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41],
        'category' : ["A", "A","A","A","C","A","B","C","B","B"]
       }

df = pd.DataFrame(data, columns = ['date', 'battle_deaths', 'category'])
df['date'] = pd.to_datetime(df['date'])
df.index = df['date']
del df['date']

# just an empty figure
fig = plotly.graph_objs.Figure()

# a dict which maps your categorical values to colors
colors = {'A': 'orange',
          'B': 'green',
          'C': 'red'}

# the list which stores categories which were already plotted
already_plotted = []

for i in range(df.shape[0] + 1):
    # create a new trace if the category changes or at the end of the data frame
    if i in (0, df.shape[0]) or cat != df.iloc[i, ]['category']:
        if i != 0:
            if i != df.shape[0]:
                x.append(df.iloc[i,].name)
                y.append(df.iloc[i,]['battle_deaths'])
            trace = plotly.graph_objs.Scatter(x=x, y=y, 
                                              legendgroup=cat,  # group identical categories
                                              showlegend=cat not in already_plotted,  # hide legend if already plotted
                                              name=cat,
                                              marker={'color': colors[df.iloc[i - 1, ]['category']]})
            fig.add_trace(trace)
            already_plotted.append(cat)

        if i == df.shape[0]:
            continue
        cat = df.iloc[i, ]['category']
        x = []
        y = []    

    x.append(df.iloc[i,].name)
    y.append(df.iloc[i,]['battle_deaths'])

plotly.offline.iplot(fig)