使用pandas plot()时出现意外振荡

时间:2013-11-30 22:28:44

标签: python matplotlib plot pandas

我从网上获取了一些数据并将其加载到pandas数据框

import pandas as pd
%pylab inline

loc = 'https://blockchain.info/charts/hash-rate?showDataPoints=false&timespan=all&show_header=true&daysAverageString=1&scale=1&format=csv&address='
df = pd.read_csv(loc, parse_dates = True, 
                 index_col = 0, skiprows = 1, 
                 names = ['Date', 'Hash Rate (Gh/s)'])

然后我尝试使用pandas df.plot命令

绘制它
df['Hash Rate (Gh/s)'].plot(logy = True)

我收到的情节有意外的振荡made with pandas

但如果我用matplotlib绘制相同的数据

plt.semilogy(df['Hash Rate (Gh/s)'])

made with matplotlib

没有这些振荡。

我试图使用pandas reindex功能

df_idx = pd.date_range(df.index[0], df.index[-1])
df = df.reindex(df_idx, fill_value=nan) 

但到目前为止还没有找到任何摆脱这些虚假振荡的方法。如何消除这些振荡或在熊猫中重新索引以消除它们?

1 个答案:

答案 0 :(得分:1)

您的日期未正确解析:他们有几天前。如果您将dayfirst=True传递给read_csv,则应解决问题。

In [6]: df = pd.read_csv("ooo.csv", skiprows=1, names=['Date', 'Hash Rate (Gh/s)'], parse_dates=True, index_col=0, dayfirst=True)

In [7]: df.head(10)
Out[7]: 
                     Hash Rate (Gh/s)
Date                                 
2009-01-04 18:15:05          0.000000
2009-01-05 18:15:05          0.000000
2009-01-06 18:15:05          0.000000
2009-01-07 18:15:05          0.000000
2009-01-08 18:15:05          0.000000
2009-01-09 18:15:05          0.000696
2009-01-10 18:15:05          0.001541
2009-01-11 18:15:05          0.005269
2009-01-12 18:15:05          0.004424
2009-01-13 18:15:05          0.005717

[10 rows x 1 columns]

In [8]: !head ooo.csv
03/01/2009 18:15:05,0.00004971026962962963
04/01/2009 18:15:05,0.0
05/01/2009 18:15:05,0.0
06/01/2009 18:15:05,0.0
07/01/2009 18:15:05,0.0
08/01/2009 18:15:05,0.0
09/01/2009 18:15:05,0.0006959437748148148
10/01/2009 18:15:05,0.0015410183585185184
11/01/2009 18:15:05,0.005269288580740741
12/01/2009 18:15:05,0.004424213997037036

In [9]: df["Hash Rate (Gh/s)"].plot(logy=True)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0xc4ea58c>

产生

figure w/o oscillations