这是文件的前10行。
Year Revenue
0 Jan-07 1757000
1 Feb-07 2052000
2 Mar-07 2747000
3 Apr-07 2308000
4 May-07 2289000
5 Jun-07 2322000
6 Jul-07 2310000
7 Aug-07 2049000
8 Sep-07 1862000
9 Oct-07 2006000
10 Nov-07 2061000
我按如下所示启动代码:
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib as mpl
%matplotlib inline
from pandas.plotting import register_matplotlib_converters
from pandas_datareader import data as pdr
from pandas.plotting import autocorrelation_plot
import seaborn as sns
from datetime import datetime
from datetime import timedelta```
I then imported my data set into the file
```df=pd.read_csv(pathway.csv', sep=',',)
I wanted to see the data types of my file to see what I was working with.
So I used ```df.info``` to see what my datafile types were.
RangeIndex:144个条目,0到143 数据列(共2列): 144年非空对象 回收材料销售收入144非空int64 dtypes:int64(1),对象(1) 内存使用量:2.3+ K
Then I tried to translate the years into yyyy-mm-dd format by using this code but I error out with OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-07 00:00:00
df.month = pd.to_datetime(df.month) df.set_index('month',inplace = True)
I expect from my data set to change to
0 2007-01-01 1757000
1 2007-02-01 2052000
2 2007-03-01 2747000
3 2007-04-01 2308000
4 2007-05-01 2289000
5 2007-06-01 2322000
......
once I complete this i will plot a time series graph, with $ on the y column and x being the date
答案 0 :(得分:0)
通过添加缺少的内容来固定日期?我想我们可以把这一天定为每月的第一天。
然后将它们转换为日期时间
df["year"] = "01-" + df["year"]
df["year"] = pd.to_datetime(df["year"], format="%d-%b-%y")
df = df.set_index("year")