无法重新采样,然后绘制Pandas数据帧

时间:2017-10-28 16:47:01

标签: python pandas plotly

我一直试图绘制一个来自Pandas数据帧的简单resampled数据。这是我的初始代码:

import pandas as pd
import numpy as np
from datetime import datetime, timedelta

# Extra plotly bits
import plotly
import plotly.plotly as py
import plotly.graph_objs as go

date_today = datetime.now()
days = pd.date_range(date_today, date_today + timedelta(56), freq='D')

np.random.seed(seed=1111)
data = np.random.randint(1, high=100, size=len(days))
df = pd.DataFrame({'date': days, 'value': data})

当我print df时,我得到了这个:

                         date  value
0  2017-10-28 17:13:23.867396     29
1  2017-10-29 17:13:23.867396     56
2  2017-10-30 17:13:23.867396     82
3  2017-10-31 17:13:23.867396     13
4  2017-11-01 17:13:23.867396     35
5  2017-11-02 17:13:23.867396     53
6  2017-11-03 17:13:23.867396     25
7  2017-11-04 17:13:23.867396     23
8  2017-11-05 17:13:23.867396     21
9  2017-11-06 17:13:23.867396     12
10 2017-11-07 17:13:23.867396     15
...
48 2017-12-15 17:13:23.867396      1
49 2017-12-16 17:13:23.867396     88
50 2017-12-17 17:13:23.867396     94
51 2017-12-18 17:13:23.867396     48
52 2017-12-19 17:13:23.867396     26
53 2017-12-20 17:13:23.867396     65
54 2017-12-21 17:13:23.867396     53
55 2017-12-22 17:13:23.867396     54
56 2017-12-23 17:13:23.867396     76

我可以轻松地绘制这个(下面的示例图中的红线)。但是,当我尝试创建一个额外的数据层时会出现问题,这是值/日期关系的下采样版本,如每5天跳过一次,然后绘制它。

为此,我创建了一个数据框的采样副本:

df_sampled = df.set_index('date').resample('5D').mean()

当我做print df_sampled时,我得到:

                            value
date
2017-10-28 17:32:39.622881   43.0
2017-11-02 17:32:39.622881   26.8
2017-11-07 17:32:39.622881   26.6
2017-11-12 17:32:39.622881   59.4
2017-11-17 17:32:39.622881   66.8
2017-11-22 17:32:39.622881   33.6
2017-11-27 17:32:39.622881   27.8
2017-12-02 17:32:39.622881   64.4
2017-12-07 17:32:39.622881   43.2
2017-12-12 17:32:39.622881   64.4
2017-12-17 17:32:39.622881   57.2
2017-12-22 17:32:39.622881   65.0

在那之后,我再也无法对此进行绘图了,该专栏似乎已被打破。用剧情:

    x = df_sampled['date'],
    y = df_sampled['value'],

我收到此错误:

File "interpolation.py", line 36, in <module>
    x = df_sampled['date'],
...
KeyError: 'date'

我该如何解决这个问题。基本上,我正在尝试创建此图像。红线是我的原始数据,蓝色是下采样和平滑版本。

enter image description here

---更新---

以下提供的答案有效,我得到以下结果:

enter image description here

1 个答案:

答案 0 :(得分:2)

import sqlite3 c = cnn.cursor() def creat_table(): c.execute("CREATE TABLE IF NOT EXISTS babyName (name TEXT, gender TEXT, frequency INTEGER, year TEXT)") def data_entry(): c.execute("INSERT INTO babyName VALUES ('Mary', 'F', 1234, '2008')") cnn.commit() def new_data_entry(): name = 'Wendy' gender = 'F' frequency = 321 year = '2006' c.execute("INSERT INTO babyName (name, gender, frequency, year) VALUES (?, ?, ?, ?)", (name, gender, frequency, year)) cnn.commit() def finish(): c.close() cnn.close() data_entry() print('It works!') new_data_entry() finish() 不是列,而是date,因此需要:

index

或者通过reset_index {}创建x = df_sampled.index y = df_sampled['value'] 列:

index