Question

在我意识到层规范不是问题之前是struggling with plotting a few layers of a chart，但是我通过图表的切片在某种程度上对我来说是奇怪的。如果没有损坏，那么我必须误会事情应该如何工作。

附上一个具体示例以演示其工作原理以及我认为它不应该这样工作。

import altair as alt
alt.renderers.enable('notebook')

import pandas as pd

idx = pd.IndexSlice

history_index = pd.date_range(start="31jan2016", end="30jun2019", freq="M")
forecast_index = pd.date_range(start="31jan2019", end="31dec2019", freq="M")

history_df = pd.DataFrame([z for z in range(len(history_index))], index=history_index,columns = ['history'])
forecast_df = pd.DataFrame([z for z in range(len(forecast_index))], index=forecast_index, columns = ['forecast'])

df = history_df.join(forecast_df, how="outer")
df.index.name = "date"

第一个示例有效：

#without making it a seasonal chart,  this works
non_seasonal  = alt.Chart(df.loc[idx['20170701':],:].reset_index(), title=f"non seasonal plot").mark_line().encode(
        x='date',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
non_seasonal

但是当我开始将它们变成季节性图表时，通过将X轴设为月份，就会出现问题。

我的第一个切片有效，我只是对所有现有forecast数据进行切片，这些数据始于2019年1月。

#works ok: shows all the data since 1jan2019
seasonal1 = alt.Chart(df.loc[idx['20190101':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
seasonal1

但是当我从较早的日期开始切片时（在“预测”之前没有任何数据），我会遇到麻烦。

#fails:  shows no data
seasonal2 = alt.Chart(df.loc[idx['20180101':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
seasonal2

如果添加颜色编码，我可以使数据显示出来，但这并不是最终适合我的解决方案。

#works if I add a color-encoding
seasonal3 = alt.Chart(df.loc[idx['20180101':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    color="year(date):N"
    )
seasonal3

此时，事情开始变得很奇怪。如果我在2018年的任何地方开始切片，则切片的“开始”似乎代替了切片的“结束”。

#fails bizarrely -- the 20180701 slice appears to be the END of the slice, not the start
seasonal4 = alt.Chart(df.loc[idx['20180701':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
seasonal4

再次，如果我给它一种颜色编码，它会起作用

#again, it works if I add a color encoding.
seasonal5 = alt.Chart(df.loc[idx['20180701':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
        color="year(date):N"
    )
seasonal5

因此，显而易见的快速解决方法是添加颜色编码。但这对我不起作用，因为我正尝试在此图表上分层放置多组数据（按年着色的历史数据），并将预测数据硬编码为红色。

============================================

根据下面的杰克回答，我得到了想要的最终产品：

forecast = alt.Chart(df.loc[idx['20180101':],'forecast'].reset_index().dropna(), title=f"seasonal plot").mark_line(color="green").encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )

history = alt.Chart(df.loc[idx['20170101':],'history'].reset_index().dropna(), title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'history', scale=alt.Scale(zero=False)),
        color="year(date):O"
    )

forecast+history

Answer 1

如果将mark_line()更改为mark_point()，则会看到数据确实存在，但未在折线图中显示。为什么？因为只在相邻的非空点之间绘制了一条线。

看一下df.loc[idx['20180101':],:]的输出：您会看到它包含许多行，大多数都是NaN值。当从索引中提取月份时，这些NaN值会散布在具有匹配月份的定义值中，这会在行中产生中断：在某些情况下，中断太多，以至于不再有任何相邻的非空点连接，因此没有画线。

顺便说一下，这就是为什么添加颜色编码可以改善这种情况的原因：这意味着以前年份的空数据不再与定义的数据包含在同一组中，因此相邻点为非空，并且一条线可以被绘制。

要解决此问题，建议您对数据切片方式和/或过滤要创建的切片的NaN值更加谨慎。例如，在您的seasonal2图表中，您可以这样做：

df_sliced = df.loc[idx['20180101':],:].dropna().reset_index()
seasonal2 = alt.Chart(df_sliced, title=f"seasonal plot").mark_line().encode(
        x='month(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
seasonal2

另一种选择是提取日期时使用yearmonth而不是month，这样可以防止未定义的数据散布在已定义的数据中：

seasonal2 = alt.Chart(df.loc[idx['20180101':],:].reset_index(), title=f"seasonal plot").mark_line().encode(
        x='yearmonth(date)',
        y=alt.Y(f'forecast', scale=alt.Scale(zero=False)),
    )
seasonal2

其他示例也可以类似的方式固定。

根据切片日期，将.loc [date]切片传递到altair图表中会产生奇怪的结果

1 个答案: