我正在尝试plot
pandas
中的一个值范围。这些值取自df
,这些值显示在任何时间点出现的值的总数。
我的尝试在下面。我遇到的问题是Columns
的格式在午夜之后无法正确设置。与午夜之后的时间戳相关的值是x-axis
在前,而不是在后。 (请参见下图)
plotted
输出:
答案 0 :(得分:2)
使用
df = df.astype({
"Time1": np.datetime64,
"Occurring1": np.int})
每个时间标记具有相同的日期(2019-03-05
只是今天的日期)。 all_times
的所有元素也具有相同的日期。然后从这里使用time_grid = np.arange(all_times.min(), all_times.max(), 10*t_min, dtype="datetime64")
“得到错误的曲线”。
有两种解决该问题的策略:
策略A
如果您对看到的数据感到满意,但仅因不喜欢午夜后的数据而感到不满意(您希望在此找到数据),则可以移动/滚动数据。这种方法不会改变您提取数据以绘制图形的方式。我插入了以下步骤:
Time_i
确定最早的时间标记(=时间序列应开始的时间)。这是t_start
index
。这给出了(下面是代码)
策略B
由于没有日期的时间标记是定期的,因此您遇到了提到的问题。对于插值,时间轴应单调增加。因此,方法是:用scipy.interpolate.griddata(points, values, xi)
进行插值时,用于points
和x1
的代理变量单调增加。为此,您将必须调整确定occurrences_grid
的过程。
这是策略A的代码。
d = ({
'Time1' : ['8:00:00','10:30:00','12:40:00','16:25:00','22:30:00','1:31:00','2:15:00','2:20:00','2:30:00'],
'Occurring1' : ['1','2','3','4','5','4','3','2','1'],
'Time2' : ['8:10:00','10:10:00','13:40:00','16:05:00','21:30:00','1:11:00','3:00:00','3:01:00','6:00:00'],
'Occurring2' : ['1','2','3','4','5','4','3','2','0'],
'Time3' : ['8:05:00','11:30:00','15:40:00','17:25:00','23:30:00','1:01:00','6:00:00','6:00:00','6:00:00'],
'Occurring3' : ['1','2','2','3','2','1','0','0','0'],
'Time4' : ['9:50:00','10:30:00','14:40:00','18:25:00','20:30:00','0:31:00','2:35:00','6:00:00','6:00:00'],
'Occurring4' : ['1','2','3','4','4','3','2','0','0'],
'Time5' : ['9:00:00','11:30:00','13:40:00','17:25:00','00:30:00','2:31:00','6:00:00','6:00:00','6:00:00'],
'Occurring5' : ['1','2','3','3','2','1','0','0','0'],
})
df = pd.DataFrame(data=d)
df = df.astype({
"Time1": np.datetime64,
"Occurring1": np.int,
"Time2": np.datetime64,
"Occurring2": np.int,
"Time3": np.datetime64,
"Occurring3": np.int,
"Time4": np.datetime64,
"Occurring4": np.int,
"Time5": np.datetime64,
"Occurring5": np.int,
})
all_times = df[["Time1", "Time2", "Time3",'Time4','Time5']].values
t_start = min(df["Time1"].iloc[0], df["Time2"].iloc[0], df["Time3"].iloc[0],
df["Time4"].iloc[0], df["Time5"].iloc[0]) # new: t_start
t_start = np.datetime64(t_start) # conversion pandas/numpy
t_min = np.timedelta64(int(60*1e9), "ns")
time_grid = np.arange(all_times.min(), all_times.max(), 10*t_min, dtype="datetime64")
index = np.argmax(time_grid>=t_start) # new: index to start the graphics
print('index');print(index,time_grid[index])
X = pd.Series(time_grid).dt.time.values
occurrences_grid = np.zeros((5, len(time_grid)))
for i in range(5):
occurrences_grid[i] = griddata(
points=df["Time%i" % (i+1)].values.astype("float"),
values=df["Occurring%i" % (i+1)],
xi=time_grid.astype("float"),
method="linear"
)
occ_min = np.min(occurrences_grid, axis=0)
occ_max = np.max(occurrences_grid, axis=0)
occ_mean = np.mean(occurrences_grid, axis=0)
def roll(X,occ_min,occ_max,occ_mean): # new: shift/roll the values
return np.arange(len(X)), np.roll(occ_min,-index), np.roll(occ_max,-index), np.roll(occ_mean,-index)
# do not shift X but use a surrogate time axis
X,occ_min,occ_max,occ_mean = roll(X,occ_min,occ_max,occ_mean)
fig, ax0 = plt.subplots(figsize=(9,4))
plt.style.use('ggplot')
plt.fill_between(X, occ_min, occ_max, color="blue")
plt.plot(X, occ_mean, c="white")
plt.tight_layout()
plt.show()
fig.savefig('plot_model_2.png', transparency=True)