Question

我有一个熊猫数据框，其中包含每天5年的时间序列数据。我想从整个数据集中制作一个月度图，以便该图应显示月度数据内的变化（std或其他）。我尝试创建Simillar人物，但没有找到一种方法：

例如，我有一个须藤日降水量数据：

date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()
df = pd.DataFrame({'pre':ppt},index=dates)

我可以手动执行以下操作：

one   = df['pre']['1999-12-01':'2000-11-29'].values
two   = df['pre']['2000-12-01':'2001-11-30'].values
three = df['pre']['2001-12-01':'2002-11-30'].values
four  = df['pre']['2002-12-01':'2003-11-30'].values
five  = df['pre']['2003-12-01':'2004-11-29'].values
df = pd.DataFrame({'2000':one,'2001':two,'2002':three,'2003':four,'2004':five})
std = df.std(axis=1)
lw = df.mean(axis=1)-std
up = df.mean(axis=1)+std

plt.fill_between(np.arange(365), up, lw, alpha=.4)

我正在寻找一种更Python化的方式来做到这一点，而不是手动进行！

任何帮助都会受到赞赏

Answer 1

如果我对您的理解正确，则希望将您的每日观察结果与每月的周期平均值+/- 1标准偏差作图。这就是您在下面的屏幕截图中看到的。不要忘记平淡无奇的设计和颜色选择。如果可以使用的话，我们将解决。并且请注意，我已将您的ppt = np.random.rand(1900)替换为ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()只是为了使数据看起来更像您的屏幕截图。

在这里，我按月汇总了每日数据，并检索了每个月的均值和标准差。然后，我将该数据与原始数据框合并，这样您就可以绘制源数据和分组数据，如下所示：

# imports
import matplotlib.pyplot as plt
import pandas as pd
import matplotlib.dates as mdates
import numpy as np

# Data that matches your setup, but with a random
# seed to make it reproducible
np.random.seed(42)
date = pd.to_datetime("1st of Dec, 1999")
dates = date+pd.to_timedelta(np.arange(1900), 'D')
#ppt = np.random.rand(1900)
ppt = np.random.normal(loc=0.0, scale=1.0, size=1900).cumsum()

df = pd.DataFrame({'ppt':ppt},index=dates)

# A subset
df = df.tail(200)

# Add a yearmonth column
df['YearMonth'] = df.index.map(lambda x: 100*x.year + x.month)

# Create aggregated dataframe
df2 = df.groupby('YearMonth').agg(['mean', 'std']).reset_index()
df2.columns = ['YearMonth', 'mean', 'std']

# Merge original data and aggregated data
df3 = pd.merge(df,df2,how='left',on=['YearMonth'])
df3 = df3.set_index(df.index)
df3 = df3[['ppt', 'mean', 'std']]

# Function to make your plot
def monthplot():
    fig, ax = plt.subplots(1)
    ax.set_facecolor('white')

    # Define upper and lower bounds for shaded variation
    lower_bound = df3['mean'] + df3['std']*-1
    upper_bound = df3['mean'] + df3['std']

    fig, ax = plt.subplots(1)
    ax.set_facecolor('white')

    # Source data and mean
    ax.plot(df3.index,df3['mean'], lw=0.5, color = 'red')
    ax.plot(df3.index, df3['ppt'], lw=0.1, color = 'blue')

    # Variation and shaded area
    ax.fill_between(df3.index, lower_bound, upper_bound, facecolor='grey', alpha=0.5)

    fig = ax.get_figure()

    # Assign months to X axis
    locator = mdates.MonthLocator()  # every month
    # Specify the format - %b gives us Jan, Feb...
    fmt = mdates.DateFormatter('%b')

    X = plt.gca().xaxis
    X.set_major_locator(locator)
    X.set_major_formatter(fmt)

    fig.show()

monthplot()

查看this post了解更多有关轴格式的信息，以及this post了解如何添加YearMonth列。

Answer 2

在您的示例中，您有一些错误，但是我认为这并不重要。您是否希望所有年份都在同一图形上（例如您的示例）？如果这样做，这可能对您有帮助：

df['month'] = df.index.strftime("%m-%d")
df['year'] = df.index.year
df.set_index(['month']).drop(['year'],1).plot()

来自每日时间序列数据的matplotlib中的每月阴影阴影错误/ std图

2 个答案: