pandas plot条件值seaborn

时间:2018-03-16 02:52:01

标签: pandas dataframe matplotlib seaborn

我的数据框是,

      created_at            text
2017-03-01 00:00:01        power blah blah
2017-03-01 00:00:11        foo blah blah
2017-03-01 00:01:01        bar blah blah
2017-03-02 00:00:01        foobar blah blah
2017-03-02 00:10:01        hello world
2017-03-02 01:00:01        power blah blah

created_at是我的索引,它的类型是datetime64,我可以轻松地每天切片。我想要绘制的是每天的总条目数。 我将此数据框分类为其类别,并将它们绘制在一个图表中。但我认为没有多个数据帧就有更好的方法

a = df[df["text"].str.contains("power")]
b = df[df["text"].str.contains("foo")]
c = df[df["text"].str.contains("bar")]

fig = plt.figure()
ax = fig.add_subplot(111)

df.groupby(df["created_at"].dt.date).size().plot(kind="bar", position=0)
a.groupby(a["created_at"].dt.date).size().plot(kind="bar", position=0)
b.groupby(b["created_at"].dt.date).size().plot(kind="bar", position=0)
c.groupby(c["created_at"].dt.date).size().plot(kind="bar", position=0)

plt.show()

我正在学习Seaborn,所以如果解决方案与Seaborn相关,那就太好了,但它不必坚持下去。提前谢谢!

1 个答案:

答案 0 :(得分:1)

由于您希望按天分组,请考虑将df.index转换为pd.DatetimeIndex类型,以便您可以使用df.resample(),如下所示:

# your original dataframe:
df = pd.read_json({"text":{"1488326401000":"power blah blah","1488326411000":"foo blah blah","1488326461000":"bar blah blah","1488412801000":"foobar blah blah","1488413401000":"hello world","1488416401000":"power blah blah"}})

# convert index to DatetimeIndex
df.index = pd.to_datetime(df.index)

# create function to do your calculations; not sure if this is exactly what you want
def func(df_):
    texts = ['power', 'foo', 'bar']
    d = dict()

    for text in texts:
        d[text] = df_['text'].str.contains(text).sum()

    return pd.Series(d)

# create your dataframe for plotting by resampling your data by each day and then applying the `func`
df_plot = df.resample('D').apply(func)

# do the plotting
df_plot.plot(kind='bar')

enter image description here