如何为我的数据集创建类似直方图的条形图?

时间:2017-08-04 08:47:07

标签: python pandas matplotlib

我有以下数据框df

time_diff   avg_trips_per_day
631         1.0
231         1.0
431         1.0
7031        1.0
17231       1.0
20000       20.0
21000       15.0
22000       10.0

我想创建一个直方图,其中X轴为time_diff,Y轴为avg_trips_per_day,以便查看time_diff值的分布。因此,Y轴不是df中X值重复的频率,但它应该是avg_trips_per_day。 问题是我不知道如何将time_diff放入箱中以便将其作为连续变量处理。

这是我尝试的,但它将time_diff的所有可能值都放在X轴上。

norm = plt.Normalize(df["avg_trips_per_day"].values.min(), df["avg_trips_per_day"].values.max())
colors = plt.cm.spring(norm(df["avg_trips_per_day"])) 

plt.figure(figsize=(12,8))
ax = sns.barplot(x="time_diff", y="avg_trips_per_day", data=df, palette=colors)
plt.xticks(rotation='vertical', fontsize=12)
ax.grid(b=True, which='major', color='#d3d3d3', linewidth=1.0)
ax.grid(b=True, which='minor', color='#d3d3d3', linewidth=0.5)
plt.show()

2 个答案:

答案 0 :(得分:4)

import pandas as pd
import seaborn as sns
from io import StringIO
data = pd.read_table(StringIO("""time_diff  avg_trips_per_day
631         1.0
231         1.0
431         1.0
7031        1.0
17231       1.0
20000       20.0
21000       15.0
22000       10.0"""), delim_whitespace=True)
data['timegroup'] = pd.qcut(data['time_diff'], 3)
sns.barplot(x='timegroup', y='avg_trips_per_day', data=data)

enter image description here

这是你想要的吗?

答案 1 :(得分:2)

正如您自己解释的那样,您不需要直方图,而是简单的条形图。但是根据我的理解,你想要将time_diff用于绘图。

以下内容可帮助您对数据进行分级和分组:

import pandas a pd

n_bins = 10
# bins indexed if want to use for x axis
x_bins = np.arange(n_bins)
# create bins
_, bins = pd.cut(df['time_diff'], bins=n_bins, retbins=True, right=False)
# regroup your data by computed bins indexes
binned_data = df['time_diff'].groupby(np.digitize(df['time_diff'], bins)).mean()