Pandas和Matplotlib - fill_between()vs datetime64

时间:2015-03-29 13:31:18

标签: python pandas matplotlib

有一个Pandas DataFrame:

<class 'pandas.core.frame.DataFrame'>
Int64Index: 300 entries, 5220 to 5519
Data columns (total 3 columns):
Date             300 non-null datetime64[ns]
A                300 non-null float64
B                300 non-null float64
dtypes: datetime64[ns](1), float64(2)
memory usage: 30.5 KB

我想绘制A和B系列与日期。

plt.plot_date(data['Date'], data['A'], '-')
plt.plot_date(data['Date'], data['B'], '-')

然后我想在A和B系列之间的区域上应用fill_between():

plt.fill_between(data['Date'], data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

哪个输出:

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs
could not be safely coerced to any supported types according to the casting 
rule ''safe''

matplotlib是否在fill_between()函数中接受pandas datetime64对象?我应该将其转换为不同的日期类型吗?

3 个答案:

答案 0 :(得分:23)

matplotlib.units.registry中的

Pandas registers a converter将许多日期时间类型(例如pandas DatetimeIndex和dtype datetime64的numpy数组)转换为matplotlib datenums,但它不处理Pandas {{ 1}}与dtype Series

datetime64

In [67]: import pandas.tseries.converter as converter In [68]: c = converter.DatetimeConverter() In [69]: type(c.convert(df['Date'].values, None, None)) Out[69]: numpy.ndarray # converted (good) In [70]: type(c.convert(df['Date'], None, None)) Out[70]: pandas.core.series.Series # left unchanged 检查并使用转换器处理数据(如果存在)。

因此,作为解决方法,您可以将日期转换为fill_between的NumPy数组:

datetime64

例如,

d = data['Date'].values
plt.fill_between(d, data['A'], data['B'],
                where=data['A'] >= data['B'],
                facecolor='green', alpha=0.2, interpolate=True)

enter image description here

答案 1 :(得分:5)

WillZ指出,Pandas 0.21打破了unutbu的解决方案。但是,将日期时间转换为日期会对数据分析产生显着的负面影响。此解决方案目前有效并保持日期时间:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
d = data['Date'].dt.to_pydatetime()
plt.plot_date(d, data['A'], '-')
plt.plot_date(d, data['B'], '-')


plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
plt.xticks(rotation=25)
plt.show()

fill_between with datetime64 constraint

编辑:根据jedi的评论,我开始确定以下三个选项中最快的方法:

  • method1 =原始答案
  • method2 = jedi的评论+原始答案
  • method3 = jedi的评论

方法2略快,但更加一致,因此我编辑了上述答案以反映最佳方法。

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import time


N = 300
dates = pd.date_range('2000-1-1', periods=N, freq='ms')
x = np.linspace(0, 2*np.pi, N)
data = pd.DataFrame({'A': np.sin(x), 'B': np.cos(x),
           'Date': dates})
time_data = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'])
method1 = []
method2 = []
method3 = []
for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method1.append(time.clock() - start)

for i  in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        d = data['Date'].dt.to_pydatetime()
        plt.plot_date(d, data['A'], '-')
        plt.plot_date(d, data['B'], '-')


        plt.fill_between(d, data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method2.append(time.clock() - start)

for i in range(0, 10):
    start = time.clock()
    for i in range(0, 500):
        #d = [pd.Timestamp(x).to_pydatetime() for x in data['Date']]
        #d = data['Date'].dt.to_pydatetime()
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['A'], '-')
        plt.plot_date(data['Date'].dt.to_pydatetime(), data['B'], '-')


        plt.fill_between(data['Date'].dt.to_pydatetime(), data['A'], data['B'],
            where=data['A'] >= data['B'],
            facecolor='green', alpha=0.2, interpolate=True)
        plt.xticks(rotation=25)
        plt.gcf().clear()
    method3.append(time.clock() - start)

time_data.loc['method1'] = method1
time_data.loc['method2'] = method2
time_data.loc['method3'] = method3
print(time_data)
plt.errorbar(time_data.index, time_data.mean(axis=1), yerr=time_data.std(axis=1))

time test of 3 methods on converting time data for plotting a DataFrame

答案 2 :(得分:4)

升级到Pandas 0.21后我遇到了这个问题。我的代码以前用fill_between()运行良好,但在升级后中断了。

事实证明,在@unutbu的答案中提到的这个修复,这是我之前所拥有的,只有在DatetimeIndex包含date个对象而不是datetime个对象的情况下才有效。有时间信息。

查看上面的示例,我所做的是在调用fill_between()之前添加以下行:

d['Date'] = [z.date() for z in d['Date']]