我从网上获取了一些数据并将其加载到pandas数据框
中import pandas as pd
%pylab inline
loc = 'https://blockchain.info/charts/hash-rate?showDataPoints=false×pan=all&show_header=true&daysAverageString=1&scale=1&format=csv&address='
df = pd.read_csv(loc, parse_dates = True,
index_col = 0, skiprows = 1,
names = ['Date', 'Hash Rate (Gh/s)'])
然后我尝试使用pandas df.plot命令
绘制它df['Hash Rate (Gh/s)'].plot(logy = True)
我收到的情节有意外的振荡
但如果我用matplotlib绘制相同的数据
plt.semilogy(df['Hash Rate (Gh/s)'])
没有这些振荡。
我试图使用pandas reindex功能
df_idx = pd.date_range(df.index[0], df.index[-1])
df = df.reindex(df_idx, fill_value=nan)
但到目前为止还没有找到任何摆脱这些虚假振荡的方法。如何消除这些振荡或在熊猫中重新索引以消除它们?
答案 0 :(得分:1)
您的日期未正确解析:他们有几天前。如果您将dayfirst=True
传递给read_csv
,则应解决问题。
In [6]: df = pd.read_csv("ooo.csv", skiprows=1, names=['Date', 'Hash Rate (Gh/s)'], parse_dates=True, index_col=0, dayfirst=True)
In [7]: df.head(10)
Out[7]:
Hash Rate (Gh/s)
Date
2009-01-04 18:15:05 0.000000
2009-01-05 18:15:05 0.000000
2009-01-06 18:15:05 0.000000
2009-01-07 18:15:05 0.000000
2009-01-08 18:15:05 0.000000
2009-01-09 18:15:05 0.000696
2009-01-10 18:15:05 0.001541
2009-01-11 18:15:05 0.005269
2009-01-12 18:15:05 0.004424
2009-01-13 18:15:05 0.005717
[10 rows x 1 columns]
In [8]: !head ooo.csv
03/01/2009 18:15:05,0.00004971026962962963
04/01/2009 18:15:05,0.0
05/01/2009 18:15:05,0.0
06/01/2009 18:15:05,0.0
07/01/2009 18:15:05,0.0
08/01/2009 18:15:05,0.0
09/01/2009 18:15:05,0.0006959437748148148
10/01/2009 18:15:05,0.0015410183585185184
11/01/2009 18:15:05,0.005269288580740741
12/01/2009 18:15:05,0.004424213997037036
In [9]: df["Hash Rate (Gh/s)"].plot(logy=True)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0xc4ea58c>
产生