我的问题(写在下面的末尾)与绘制两个数据框架在不同子图中(情况1 )的直方图有关,而不是在同一图中绘制它们(下面的情况2 )。以1小时的间隔作为分组标准绘制直方图。两个DataFrame都有一列,其时间为"HH:MM"
格式。
# Defining the two DataFrames
df_in = pd.DataFrame({'time': ['12:20', '12:06', '11:30', '11:03', '10:44', '10:50', '11:52',
'12:21', '9:58', '12:43','12:56', '13:27', '12:14',]})
df_out = pd.DataFrame({'time': ['19:40', '19:44', '19:21', '20:37', '20:27', '18:46', '19:42',
'18:12', '19:08', '21:09', '18:37', '20:34', '20:15']})
情况1 :将两个DataFrame绘制在不同的子图中
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.ticker import FixedFormatter
fig, axes = plt.subplots(1, 2, figsize=(9, 3))
colors = ['r', 'b']
titles = ['df-in', 'df-out']
# Looping over the dataframes and plotting them in subfigures
for df, ax, c, t in zip([df_in, df_out], axes.flatten(), colors, titles):
df['hour'] = pd.to_datetime(df['time'], format='%H:%M')
df.set_index('hour', drop=False, inplace=True)
df = df['hour'].groupby(pd.Grouper(freq='60Min')).count()
df.plot(kind='bar', color=c, ax=ax)
ticklabels = df.index.strftime('%H:%Mh')
ax.xaxis.set_major_formatter(FixedFormatter(ticklabels))
ax.set_title(t, fontsize=18)
plt.show()
情况1的输出
情况2 :在同一图中绘制两个数据框
fig, axes = plt.subplots(figsize=(7, 3))
# Looping over the dataframes and plotting them in subfigures
for df, c, t in zip([df_in, df_out], colors, titles):
df['hour'] = pd.to_datetime(df['time'], format='%H:%M')
df.set_index('hour', drop=False, inplace=True)
df = df['hour'].groupby(pd.Grouper(freq='60Min')).count()
df.plot(kind='bar', color=c, ax=axes)
ticklabels = df.index.strftime('%H:%Mh')
axes.xaxis.set_major_formatter(FixedFormatter(ticklabels))
plt.show()
情况2的输出
在两种情况下,用于格式化字符串的代码均来自this问题。如您所见,单独绘制时,红色和蓝色直方图在12:00和19:00 h分别具有最大值。但是,当我在同一图中绘制它们时,两个直方图是重叠的,最大值不在12:00和19:00 h。这个问题看似微不足道,但我不确定出了什么问题。
我的问题是:在情况2 中需要修改哪些内容,以使直方图清楚地定位在12点附近,并且可区分(而不是重叠) :00和19:00 h?任何指针和建议,欢迎。
答案 0 :(得分:2)
您还可以使用sns
强大的色相:
# convert to time
df_in.time = pd.to_datetime(df_in.time)
df_out.time = pd.to_datetime(df_out.time)
# mark the series/dataframe and join
df_in['df'] = 'df_in'
df_out['df'] = 'df_out'
df = pd.concat((df_in,df_out))
# groupby hours:
df = df.groupby(['df',df.time.dt.hour]).size().reset_index()
# plot with sns
plt.figure(figsize=(10,6))
sns.barplot(x='time',
y=0,
hue='df',
dodge=False,
data=df)
plt.show()
输出:
编辑:要绘制x轴从7到23的条形图,我们可以在绘制前reindex
:
df = (df.groupby(['df', df.time.dt.hour]).size()
.reset_index(level=0).reindex(range(7,24))
.reset_index()
)
sns
的小图给出了:
答案 1 :(得分:1)
数字条形图可能看起来像这样:
import pandas as pd
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
import matplotlib.pyplot as plt
from matplotlib.dates import HourLocator, DateFormatter
# Defining the two DataFrames
df_in = pd.DataFrame({'time': ['12:20', '12:06', '11:30', '11:03', '10:44', '10:50', '11:52',
'12:21', '9:58', '12:43','12:56', '13:27', '12:14',]})
df_out = pd.DataFrame({'time': ['19:40', '19:44', '19:21', '20:37', '20:27', '18:46', '19:42',
'18:12', '19:08', '21:09', '18:37', '20:34', '20:15']})
colors = ['r', 'b']
titles = ['df-in', 'df-out']
fig, ax = plt.subplots(figsize=(7, 3))
for df, c, t in zip([df_in, df_out], colors, titles):
df['hour'] = pd.to_datetime(df['time'], format='%H:%M')
df.set_index('hour', drop=False, inplace=True)
df = df['hour'].groupby(pd.Grouper(freq='60Min')).count()
df.index = pd.to_datetime(df.index)
ax.bar(df.index, df.values, width=1/24/2, color=c, label=t)
ax.xaxis.set_major_locator(HourLocator())
ax.xaxis.set_major_formatter(DateFormatter("%H:%Mh"))
ax.set_xlim(pd.to_datetime(["1900-01-01 07:00", "1900-01-01 23:00"]))
plt.setp(ax.get_xticklabels(), rotation=90)
plt.tight_layout()
plt.show()