如何在pandas + matplotlib的几个日期中绘制一个列的多个因子的值?

时间:2016-01-16 19:02:40

标签: python pandas matplotlib

我有一个如下所示的数据框:

df=pd.Dataframe({'animal': {Timestamp('2014-11-12 00:00:00'): 'dog',
  Timestamp('2014-11-13 00:00:00'): 'rabbit',
  Timestamp('2014-11-14 00:00:00'): 'rabbit',
  Timestamp('2014-11-15 00:00:00'): 'rabbit',
  Timestamp('2014-11-16 00:00:00'): 'rabbit',
  Timestamp('2014-11-17 00:00:00'): 'rabbit',
  Timestamp('2014-11-18 00:00:00'): 'dog',
  Timestamp('2014-11-19 00:00:00'): 'rabbit',
  Timestamp('2014-11-20 00:00:00'): 'dog',
  Timestamp('2014-11-21 00:00:00'): 'dog',
  Timestamp('2014-12-01 00:00:00'): 'rabbit',
  Timestamp('2014-12-02 00:00:00'): 'dog',
  Timestamp('2014-12-03 00:00:00'): 'dog',
  Timestamp('2014-12-04 00:00:00'): 'rabbit',
  Timestamp('2014-12-05 00:00:00'): 'rabbit',
  Timestamp('2014-12-06 00:00:00'): 'dog',
  Timestamp('2014-12-07 00:00:00'): 'dog',
  Timestamp('2014-12-08 00:00:00'): 'rabbit',
  Timestamp('2014-12-09 00:00:00'): 'rabbit',
  Timestamp('2014-12-10 00:00:00'): 'rabbit',
  Timestamp('2014-12-11 00:00:00'): 'rabbit',
  Timestamp('2014-12-12 00:00:00'): 'rabbit',
  Timestamp('2014-12-13 00:00:00'): 'rabbit',
  Timestamp('2014-12-14 00:00:00'): 'rabbit',
  Timestamp('2014-12-15 00:00:00'): 'dog',
  Timestamp('2014-12-16 00:00:00'): 'dog',
  Timestamp('2014-12-17 00:00:00'): 'dog',
  Timestamp('2014-12-18 00:00:00'): 'rabbit',
  Timestamp('2014-12-19 00:00:00'): 'rabbit',
  Timestamp('2014-12-20 00:00:00'): 'dog'},
 'count': {Timestamp('2014-11-12 00:00:00'): 6136,
  Timestamp('2014-11-13 00:00:00'): 14620,
  Timestamp('2014-11-14 00:00:00'): 16437,
  Timestamp('2014-11-15 00:00:00'): 17273,
  Timestamp('2014-11-16 00:00:00'): 15302,
  Timestamp('2014-11-17 00:00:00'): 15180,
  Timestamp('2014-11-18 00:00:00'): 7177,
  Timestamp('2014-11-19 00:00:00'): 16193,
  Timestamp('2014-11-20 00:00:00'): 8226,
  Timestamp('2014-11-21 00:00:00'): 9741,
  Timestamp('2014-12-01 00:00:00'): 26237,
  Timestamp('2014-12-02 00:00:00'): 12146,
  Timestamp('2014-12-03 00:00:00'): 12910,
  Timestamp('2014-12-04 00:00:00'): 25820,
  Timestamp('2014-12-05 00:00:00'): 29323,
  Timestamp('2014-12-06 00:00:00'): 17294,
  Timestamp('2014-12-07 00:00:00'): 15219,
  Timestamp('2014-12-08 00:00:00'): 26174,
  Timestamp('2014-12-09 00:00:00'): 27112,
  Timestamp('2014-12-10 00:00:00'): 27131,
  Timestamp('2014-12-11 00:00:00'): 28268,
  Timestamp('2014-12-12 00:00:00'): 34059,
  Timestamp('2014-12-13 00:00:00'): 39162,
  Timestamp('2014-12-14 00:00:00'): 38314,
  Timestamp('2014-12-15 00:00:00'): 19807,
  Timestamp('2014-12-16 00:00:00'): 20606,
  Timestamp('2014-12-17 00:00:00'): 21552,
  Timestamp('2014-12-18 00:00:00'): 36499,
  Timestamp('2014-12-19 00:00:00'): 42163,
  Timestamp('2014-12-20 00:00:00'): 30301},
 'day': {Timestamp('2014-11-12 00:00:00'): 12,
  Timestamp('2014-11-13 00:00:00'): 13,
  Timestamp('2014-11-14 00:00:00'): 14,
  Timestamp('2014-11-15 00:00:00'): 15,
  Timestamp('2014-11-16 00:00:00'): 16,
  Timestamp('2014-11-17 00:00:00'): 17,
  Timestamp('2014-11-18 00:00:00'): 18,
  Timestamp('2014-11-19 00:00:00'): 19,
  Timestamp('2014-11-20 00:00:00'): 20,
  Timestamp('2014-11-21 00:00:00'): 21,
  Timestamp('2014-12-01 00:00:00'): 1,
  Timestamp('2014-12-02 00:00:00'): 2,
  Timestamp('2014-12-03 00:00:00'): 3,
  Timestamp('2014-12-04 00:00:00'): 4,
  Timestamp('2014-12-05 00:00:00'): 5,
  Timestamp('2014-12-06 00:00:00'): 6,
  Timestamp('2014-12-07 00:00:00'): 7,
  Timestamp('2014-12-08 00:00:00'): 8,
  Timestamp('2014-12-09 00:00:00'): 9,
  Timestamp('2014-12-10 00:00:00'): 10,
  Timestamp('2014-12-11 00:00:00'): 11,
  Timestamp('2014-12-12 00:00:00'): 12,
  Timestamp('2014-12-13 00:00:00'): 13,
  Timestamp('2014-12-14 00:00:00'): 14,
  Timestamp('2014-12-15 00:00:00'): 15,
  Timestamp('2014-12-16 00:00:00'): 16,
  Timestamp('2014-12-17 00:00:00'): 17,
  Timestamp('2014-12-18 00:00:00'): 18,
  Timestamp('2014-12-19 00:00:00'): 19,
  Timestamp('2014-12-20 00:00:00'): 20},
 'month': {Timestamp('2014-11-12 00:00:00'): 11,
  Timestamp('2014-11-13 00:00:00'): 11,
  Timestamp('2014-11-14 00:00:00'): 11,
  Timestamp('2014-11-15 00:00:00'): 11,
  Timestamp('2014-11-16 00:00:00'): 11,
  Timestamp('2014-11-17 00:00:00'): 11,
  Timestamp('2014-11-18 00:00:00'): 11,
  Timestamp('2014-11-19 00:00:00'): 11,
  Timestamp('2014-11-20 00:00:00'): 11,
  Timestamp('2014-11-21 00:00:00'): 11,
  Timestamp('2014-12-01 00:00:00'): 12,
  Timestamp('2014-12-02 00:00:00'): 12,
  Timestamp('2014-12-03 00:00:00'): 12,
  Timestamp('2014-12-04 00:00:00'): 12,
  Timestamp('2014-12-05 00:00:00'): 12,
  Timestamp('2014-12-06 00:00:00'): 12,
  Timestamp('2014-12-07 00:00:00'): 12,
  Timestamp('2014-12-08 00:00:00'): 12,
  Timestamp('2014-12-09 00:00:00'): 12,
  Timestamp('2014-12-10 00:00:00'): 12,
  Timestamp('2014-12-11 00:00:00'): 12,
  Timestamp('2014-12-12 00:00:00'): 12,
  Timestamp('2014-12-13 00:00:00'): 12,
  Timestamp('2014-12-14 00:00:00'): 12,
  Timestamp('2014-12-15 00:00:00'): 12,
  Timestamp('2014-12-16 00:00:00'): 12,
  Timestamp('2014-12-17 00:00:00'): 12,
  Timestamp('2014-12-18 00:00:00'): 12,
  Timestamp('2014-12-19 00:00:00'): 12,
  Timestamp('2014-12-20 00:00:00'): 12}}

我试图在七天内绘制两只动物的计数的线图;基本上,我的目标是每个动物的时间序列显示在同一个图表上。

这是我的代码:

df['date'] = pd.to_datetime(df['date'], dayfirst=True, infer_datetime_format = True)
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')

grouped = df.groupby('animal')
for key, group in grouped:
    data = group.groupby(lambda x: x.day)
    data['count'].plot(label=key)


plt.legend()

plt.show()

而不是显示两种动物的计数的东西,如下所示: enter image description here

我最接近的是以下内容: enter image description here

我觉得我在这里错过了一个明显的大块,但是不能很清楚这一点。

编辑:我无法完全掌握如何按月和日进行排序,因此将一些数据附加到数据框中。

1 个答案:

答案 0 :(得分:3)

创建day列以存储所有日期数字:

df['day'] = df.index.day

由于我们希望在x - 轴上排序日期,因此也要对列进行排序:

df = df.sort_values(by='day')

然后你可以按animal分组并绘制每个小组:

grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group.plot('day', 'count', label=key, ax=ax)

请注意,group.plot调用DataFrame.plot,可让您指定要用于x-y-轴的列。相比之下,group['count'].plot调用Series.plot,假设x-axis是索引而y-axis是系列的值。

import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp


df = pd.DataFrame({'animal': {12: 'dog', 44: 'dog', 47: 'dog', 69: 'rabbit', 76: 'rabbit', 84: 'dog', 122: 'rabbit', 162: 'rabbit', 177: 'rabbit', 190: 'rabbit', 217: 'dog', 219: 'dog', 220: 'dog', 226: 'rabbit'},
 'count': {12: 34573, 44: 30676, 47: 41821, 69: 56880, 76: 73172, 84: 30581, 122: 52895, 162: 58430, 177: 57132, 190: 53903, 217: 32001, 219: 35776, 220: 31095, 226: 53809},
 'date': {12: Timestamp('2014-12-29 00:00:00'), 44: Timestamp('2014-12-28 00:00:00'), 47: Timestamp('2014-12-31 00:00:00'), 69: Timestamp('2014-12-29 00:00:00'), 76: Timestamp('2014-12-31 00:00:00'), 84: Timestamp('2014-12-26 00:00:00'), 122: Timestamp('2014-12-25 00:00:00'), 162: Timestamp('2014-12-30 00:00:00'), 177: Timestamp('2014-12-27 00:00:00'), 190: Timestamp('2014-12-28 00:00:00'), 217: Timestamp('2014-12-27 00:00:00'), 219: Timestamp('2014-12-30 00:00:00'), 220: Timestamp('2014-12-25 00:00:00'), 226: Timestamp('2014-12-26 00:00:00')}})

df['animal'] = df['animal'].astype('category')
df = df.set_index('date')


df['day'] = df.index.day
df = df.sort_values(by='day')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group.plot('day', 'count', label=key, ax=ax)

plt.legend(loc='best')

plt.show()

enter image description here

对于修订后的问题,如果您想要x-axis的整个日期,那么最简单的方法就是使用Series.plot(正如您在原始代码中所做的那样):

import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp

df = pd.DataFrame({'animal': ['dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'dog', 'dog', 'rabbit', 'rabbit', 'dog'], 'count': [6136, 14620, 16437, 17273, 15302, 15180, 7177, 16193, 8226, 9741, 26237, 12146, 12910, 25820, 29323, 17294, 15219, 26174, 27112, 27131, 28268, 34059, 39162, 38314, 19807, 20606, 21552, 36499, 42163, 30301], 'date': [Timestamp('2014-11-12 00:00:00'), Timestamp('2014-11-13 00:00:00'), Timestamp('2014-11-14 00:00:00'), Timestamp('2014-11-15 00:00:00'), Timestamp('2014-11-16 00:00:00'), Timestamp('2014-11-17 00:00:00'), Timestamp('2014-11-18 00:00:00'), Timestamp('2014-11-19 00:00:00'), Timestamp('2014-11-20 00:00:00'), Timestamp('2014-11-21 00:00:00'), Timestamp('2014-12-01 00:00:00'), Timestamp('2014-12-02 00:00:00'), Timestamp('2014-12-03 00:00:00'), Timestamp('2014-12-04 00:00:00'), Timestamp('2014-12-05 00:00:00'), Timestamp('2014-12-06 00:00:00'), Timestamp('2014-12-07 00:00:00'), Timestamp('2014-12-08 00:00:00'), Timestamp('2014-12-09 00:00:00'), Timestamp('2014-12-10 00:00:00'), Timestamp('2014-12-11 00:00:00'), Timestamp('2014-12-12 00:00:00'), Timestamp('2014-12-13 00:00:00'), Timestamp('2014-12-14 00:00:00'), Timestamp('2014-12-15 00:00:00'), Timestamp('2014-12-16 00:00:00'), Timestamp('2014-12-17 00:00:00'), Timestamp('2014-12-18 00:00:00'), Timestamp('2014-12-19 00:00:00'), Timestamp('2014-12-20 00:00:00')]})
df = df.set_index('date')

grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
    group['count'].plot(label=key, ax=ax)

plt.legend(loc='best')

plt.show()

enter image description here