我正在使用kaggle中的IGN评论数据集,并且试图通过每个nintendo平台获取给定发布日期的x周天的频率图,这是代码
import pandas as pd
df = pd.read_csv("ign.csv")
datetime_df = pd.DataFrame({'year': df["release_year"],
'month': df["release_month"],
'day': df["release_day"]})
df["date"] = pd.to_datetime(datetime_df)
df["week_day"] = df["date"].apply(lambda x : x.weekday_name)
nintendo = ['Wii','Nintendo DS','Nintendo 3DS','Nintendo DS',
'Game Boy', 'Game Boy Color','Nintendo 64DD','Game Boy Advance',
'New Nintendo 3DS','GameCube','Nintendo DSi','Super NES']
base_nintendo = df[df["platform"].isin(nintendo)]
data = base_nintendo.groupby(["platform","week_day"]).size()
data =data.unstack().fillna(0).stack()
data
输出:
platform week_day
Game Boy Friday 5.0
Monday 5.0
Saturday 0.0
Sunday 0.0
Thursday 0.0
Tuesday 4.0
Wednesday 8.0
Game Boy Advance Friday 131.0
Monday 109.0
Saturday 0.0
Sunday 1.0
Thursday 153.0
Tuesday 123.0
Wednesday 106.0
Game Boy Color Friday 89.0
Monday 43.0
Saturday 1.0
Sunday 1.0
Thursday 55.0
Tuesday 78.0
Wednesday 89.0
GameCube Friday 99.0
Monday 100.0
Saturday 3.0
Sunday 0.0
Thursday 83.0
Tuesday 124.0
Wednesday 100.0
我尝试做:
data.groupby("platform").plot("barh")
但这只会给我最后一个平台(wii):
答案 0 :(得分:1)
是否注意到在情节上方,您为每个组(例如Super NES ....
)获得了一行?这些是您其他图的matplotlib.AxesSubplot
对象。
groupby.plot
实际上为每个组返回一个matplotlib.AxesSubplot
对象。另一方面,ipython notebook只显示您的最后一个情节。
因此,解决方案是:将您的data.groupby("platform").plot("barh")
更改为my_axes = data.groupby("platform").plot("barh")
,然后逐个处理,例如
for ax in my_axes:
ax.savefig(filename)
或者,您可以执行以下操作:
gp = data.groupby("platform")
f, axes = plt.subplots(5, 5) # or any other large enough subplot grid
for k, ax in zip(gp.groups, axes.ravel()):
gp.get_group(k).plot('barh', ax=ax)
答案 1 :(得分:1)
一种解决方案是使用seaborn
并绘制barh
。
data = data.unstack().fillna(0).stack()
data = data.reset_index().rename(columns={0:'value'})
import seaborn as sns
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(10,7))
sns.barplot(y='platform',x='value', hue='week_day', data=data, orient='h')
plt.show()