我的数据框是,
created_at text
2017-03-01 00:00:01 power blah blah
2017-03-01 00:00:11 foo blah blah
2017-03-01 00:01:01 bar blah blah
2017-03-02 00:00:01 foobar blah blah
2017-03-02 00:10:01 hello world
2017-03-02 01:00:01 power blah blah
created_at
是我的索引,它的类型是datetime64,我可以轻松地每天切片。我想要绘制的是每天的总条目数。
我将此数据框分类为其类别,并将它们绘制在一个图表中。但我认为没有多个数据帧就有更好的方法
a = df[df["text"].str.contains("power")]
b = df[df["text"].str.contains("foo")]
c = df[df["text"].str.contains("bar")]
fig = plt.figure()
ax = fig.add_subplot(111)
df.groupby(df["created_at"].dt.date).size().plot(kind="bar", position=0)
a.groupby(a["created_at"].dt.date).size().plot(kind="bar", position=0)
b.groupby(b["created_at"].dt.date).size().plot(kind="bar", position=0)
c.groupby(c["created_at"].dt.date).size().plot(kind="bar", position=0)
plt.show()
我正在学习Seaborn
,所以如果解决方案与Seaborn
相关,那就太好了,但它不必坚持下去。提前谢谢!
答案 0 :(得分:1)
由于您希望按天分组,请考虑将df.index
转换为pd.DatetimeIndex
类型,以便您可以使用df.resample()
,如下所示:
# your original dataframe:
df = pd.read_json({"text":{"1488326401000":"power blah blah","1488326411000":"foo blah blah","1488326461000":"bar blah blah","1488412801000":"foobar blah blah","1488413401000":"hello world","1488416401000":"power blah blah"}})
# convert index to DatetimeIndex
df.index = pd.to_datetime(df.index)
# create function to do your calculations; not sure if this is exactly what you want
def func(df_):
texts = ['power', 'foo', 'bar']
d = dict()
for text in texts:
d[text] = df_['text'].str.contains(text).sum()
return pd.Series(d)
# create your dataframe for plotting by resampling your data by each day and then applying the `func`
df_plot = df.resample('D').apply(func)
# do the plotting
df_plot.plot(kind='bar')