我有一个如下所示的数据框:
df=pd.Dataframe({'animal': {Timestamp('2014-11-12 00:00:00'): 'dog',
Timestamp('2014-11-13 00:00:00'): 'rabbit',
Timestamp('2014-11-14 00:00:00'): 'rabbit',
Timestamp('2014-11-15 00:00:00'): 'rabbit',
Timestamp('2014-11-16 00:00:00'): 'rabbit',
Timestamp('2014-11-17 00:00:00'): 'rabbit',
Timestamp('2014-11-18 00:00:00'): 'dog',
Timestamp('2014-11-19 00:00:00'): 'rabbit',
Timestamp('2014-11-20 00:00:00'): 'dog',
Timestamp('2014-11-21 00:00:00'): 'dog',
Timestamp('2014-12-01 00:00:00'): 'rabbit',
Timestamp('2014-12-02 00:00:00'): 'dog',
Timestamp('2014-12-03 00:00:00'): 'dog',
Timestamp('2014-12-04 00:00:00'): 'rabbit',
Timestamp('2014-12-05 00:00:00'): 'rabbit',
Timestamp('2014-12-06 00:00:00'): 'dog',
Timestamp('2014-12-07 00:00:00'): 'dog',
Timestamp('2014-12-08 00:00:00'): 'rabbit',
Timestamp('2014-12-09 00:00:00'): 'rabbit',
Timestamp('2014-12-10 00:00:00'): 'rabbit',
Timestamp('2014-12-11 00:00:00'): 'rabbit',
Timestamp('2014-12-12 00:00:00'): 'rabbit',
Timestamp('2014-12-13 00:00:00'): 'rabbit',
Timestamp('2014-12-14 00:00:00'): 'rabbit',
Timestamp('2014-12-15 00:00:00'): 'dog',
Timestamp('2014-12-16 00:00:00'): 'dog',
Timestamp('2014-12-17 00:00:00'): 'dog',
Timestamp('2014-12-18 00:00:00'): 'rabbit',
Timestamp('2014-12-19 00:00:00'): 'rabbit',
Timestamp('2014-12-20 00:00:00'): 'dog'},
'count': {Timestamp('2014-11-12 00:00:00'): 6136,
Timestamp('2014-11-13 00:00:00'): 14620,
Timestamp('2014-11-14 00:00:00'): 16437,
Timestamp('2014-11-15 00:00:00'): 17273,
Timestamp('2014-11-16 00:00:00'): 15302,
Timestamp('2014-11-17 00:00:00'): 15180,
Timestamp('2014-11-18 00:00:00'): 7177,
Timestamp('2014-11-19 00:00:00'): 16193,
Timestamp('2014-11-20 00:00:00'): 8226,
Timestamp('2014-11-21 00:00:00'): 9741,
Timestamp('2014-12-01 00:00:00'): 26237,
Timestamp('2014-12-02 00:00:00'): 12146,
Timestamp('2014-12-03 00:00:00'): 12910,
Timestamp('2014-12-04 00:00:00'): 25820,
Timestamp('2014-12-05 00:00:00'): 29323,
Timestamp('2014-12-06 00:00:00'): 17294,
Timestamp('2014-12-07 00:00:00'): 15219,
Timestamp('2014-12-08 00:00:00'): 26174,
Timestamp('2014-12-09 00:00:00'): 27112,
Timestamp('2014-12-10 00:00:00'): 27131,
Timestamp('2014-12-11 00:00:00'): 28268,
Timestamp('2014-12-12 00:00:00'): 34059,
Timestamp('2014-12-13 00:00:00'): 39162,
Timestamp('2014-12-14 00:00:00'): 38314,
Timestamp('2014-12-15 00:00:00'): 19807,
Timestamp('2014-12-16 00:00:00'): 20606,
Timestamp('2014-12-17 00:00:00'): 21552,
Timestamp('2014-12-18 00:00:00'): 36499,
Timestamp('2014-12-19 00:00:00'): 42163,
Timestamp('2014-12-20 00:00:00'): 30301},
'day': {Timestamp('2014-11-12 00:00:00'): 12,
Timestamp('2014-11-13 00:00:00'): 13,
Timestamp('2014-11-14 00:00:00'): 14,
Timestamp('2014-11-15 00:00:00'): 15,
Timestamp('2014-11-16 00:00:00'): 16,
Timestamp('2014-11-17 00:00:00'): 17,
Timestamp('2014-11-18 00:00:00'): 18,
Timestamp('2014-11-19 00:00:00'): 19,
Timestamp('2014-11-20 00:00:00'): 20,
Timestamp('2014-11-21 00:00:00'): 21,
Timestamp('2014-12-01 00:00:00'): 1,
Timestamp('2014-12-02 00:00:00'): 2,
Timestamp('2014-12-03 00:00:00'): 3,
Timestamp('2014-12-04 00:00:00'): 4,
Timestamp('2014-12-05 00:00:00'): 5,
Timestamp('2014-12-06 00:00:00'): 6,
Timestamp('2014-12-07 00:00:00'): 7,
Timestamp('2014-12-08 00:00:00'): 8,
Timestamp('2014-12-09 00:00:00'): 9,
Timestamp('2014-12-10 00:00:00'): 10,
Timestamp('2014-12-11 00:00:00'): 11,
Timestamp('2014-12-12 00:00:00'): 12,
Timestamp('2014-12-13 00:00:00'): 13,
Timestamp('2014-12-14 00:00:00'): 14,
Timestamp('2014-12-15 00:00:00'): 15,
Timestamp('2014-12-16 00:00:00'): 16,
Timestamp('2014-12-17 00:00:00'): 17,
Timestamp('2014-12-18 00:00:00'): 18,
Timestamp('2014-12-19 00:00:00'): 19,
Timestamp('2014-12-20 00:00:00'): 20},
'month': {Timestamp('2014-11-12 00:00:00'): 11,
Timestamp('2014-11-13 00:00:00'): 11,
Timestamp('2014-11-14 00:00:00'): 11,
Timestamp('2014-11-15 00:00:00'): 11,
Timestamp('2014-11-16 00:00:00'): 11,
Timestamp('2014-11-17 00:00:00'): 11,
Timestamp('2014-11-18 00:00:00'): 11,
Timestamp('2014-11-19 00:00:00'): 11,
Timestamp('2014-11-20 00:00:00'): 11,
Timestamp('2014-11-21 00:00:00'): 11,
Timestamp('2014-12-01 00:00:00'): 12,
Timestamp('2014-12-02 00:00:00'): 12,
Timestamp('2014-12-03 00:00:00'): 12,
Timestamp('2014-12-04 00:00:00'): 12,
Timestamp('2014-12-05 00:00:00'): 12,
Timestamp('2014-12-06 00:00:00'): 12,
Timestamp('2014-12-07 00:00:00'): 12,
Timestamp('2014-12-08 00:00:00'): 12,
Timestamp('2014-12-09 00:00:00'): 12,
Timestamp('2014-12-10 00:00:00'): 12,
Timestamp('2014-12-11 00:00:00'): 12,
Timestamp('2014-12-12 00:00:00'): 12,
Timestamp('2014-12-13 00:00:00'): 12,
Timestamp('2014-12-14 00:00:00'): 12,
Timestamp('2014-12-15 00:00:00'): 12,
Timestamp('2014-12-16 00:00:00'): 12,
Timestamp('2014-12-17 00:00:00'): 12,
Timestamp('2014-12-18 00:00:00'): 12,
Timestamp('2014-12-19 00:00:00'): 12,
Timestamp('2014-12-20 00:00:00'): 12}}
我试图在七天内绘制两只动物的计数的线图;基本上,我的目标是每个动物的时间序列显示在同一个图表上。
这是我的代码:
df['date'] = pd.to_datetime(df['date'], dayfirst=True, infer_datetime_format = True)
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')
grouped = df.groupby('animal')
for key, group in grouped:
data = group.groupby(lambda x: x.day)
data['count'].plot(label=key)
plt.legend()
plt.show()
我觉得我在这里错过了一个明显的大块,但是不能很清楚这一点。
编辑:我无法完全掌握如何按月和日进行排序,因此将一些数据附加到数据框中。
答案 0 :(得分:3)
创建day
列以存储所有日期数字:
df['day'] = df.index.day
由于我们希望在x
- 轴上排序日期,因此也要对列进行排序:
df = df.sort_values(by='day')
然后你可以按animal
分组并绘制每个小组:
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group.plot('day', 'count', label=key, ax=ax)
请注意,group.plot
调用DataFrame.plot
,可让您指定要用于x-
和y-
轴的列。相比之下,group['count'].plot
调用Series.plot
,假设x-axis
是索引而y-axis
是系列的值。
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'animal': {12: 'dog', 44: 'dog', 47: 'dog', 69: 'rabbit', 76: 'rabbit', 84: 'dog', 122: 'rabbit', 162: 'rabbit', 177: 'rabbit', 190: 'rabbit', 217: 'dog', 219: 'dog', 220: 'dog', 226: 'rabbit'},
'count': {12: 34573, 44: 30676, 47: 41821, 69: 56880, 76: 73172, 84: 30581, 122: 52895, 162: 58430, 177: 57132, 190: 53903, 217: 32001, 219: 35776, 220: 31095, 226: 53809},
'date': {12: Timestamp('2014-12-29 00:00:00'), 44: Timestamp('2014-12-28 00:00:00'), 47: Timestamp('2014-12-31 00:00:00'), 69: Timestamp('2014-12-29 00:00:00'), 76: Timestamp('2014-12-31 00:00:00'), 84: Timestamp('2014-12-26 00:00:00'), 122: Timestamp('2014-12-25 00:00:00'), 162: Timestamp('2014-12-30 00:00:00'), 177: Timestamp('2014-12-27 00:00:00'), 190: Timestamp('2014-12-28 00:00:00'), 217: Timestamp('2014-12-27 00:00:00'), 219: Timestamp('2014-12-30 00:00:00'), 220: Timestamp('2014-12-25 00:00:00'), 226: Timestamp('2014-12-26 00:00:00')}})
df['animal'] = df['animal'].astype('category')
df = df.set_index('date')
df['day'] = df.index.day
df = df.sort_values(by='day')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group.plot('day', 'count', label=key, ax=ax)
plt.legend(loc='best')
plt.show()
对于修订后的问题,如果您想要x-axis
的整个日期,那么最简单的方法就是使用Series.plot
(正如您在原始代码中所做的那样):
import matplotlib.pyplot as plt
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame({'animal': ['dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'dog', 'dog', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'rabbit', 'dog', 'dog', 'dog', 'rabbit', 'rabbit', 'dog'], 'count': [6136, 14620, 16437, 17273, 15302, 15180, 7177, 16193, 8226, 9741, 26237, 12146, 12910, 25820, 29323, 17294, 15219, 26174, 27112, 27131, 28268, 34059, 39162, 38314, 19807, 20606, 21552, 36499, 42163, 30301], 'date': [Timestamp('2014-11-12 00:00:00'), Timestamp('2014-11-13 00:00:00'), Timestamp('2014-11-14 00:00:00'), Timestamp('2014-11-15 00:00:00'), Timestamp('2014-11-16 00:00:00'), Timestamp('2014-11-17 00:00:00'), Timestamp('2014-11-18 00:00:00'), Timestamp('2014-11-19 00:00:00'), Timestamp('2014-11-20 00:00:00'), Timestamp('2014-11-21 00:00:00'), Timestamp('2014-12-01 00:00:00'), Timestamp('2014-12-02 00:00:00'), Timestamp('2014-12-03 00:00:00'), Timestamp('2014-12-04 00:00:00'), Timestamp('2014-12-05 00:00:00'), Timestamp('2014-12-06 00:00:00'), Timestamp('2014-12-07 00:00:00'), Timestamp('2014-12-08 00:00:00'), Timestamp('2014-12-09 00:00:00'), Timestamp('2014-12-10 00:00:00'), Timestamp('2014-12-11 00:00:00'), Timestamp('2014-12-12 00:00:00'), Timestamp('2014-12-13 00:00:00'), Timestamp('2014-12-14 00:00:00'), Timestamp('2014-12-15 00:00:00'), Timestamp('2014-12-16 00:00:00'), Timestamp('2014-12-17 00:00:00'), Timestamp('2014-12-18 00:00:00'), Timestamp('2014-12-19 00:00:00'), Timestamp('2014-12-20 00:00:00')]})
df = df.set_index('date')
grouped = df.groupby(['animal'])
fig, ax = plt.subplots()
for key, group in grouped:
group['count'].plot(label=key, ax=ax)
plt.legend(loc='best')
plt.show()