出于好奇,您如何用空白值(用于绘图目的)填充日期值
date = '''day date tempMLD salinityMLD densityMLD
1 9/12/2014 177.859887 177.859887 177.859887
2 9/13/2014 197.2614444 197.2614444 197.2614444
3 9/14/2014 199.5787079 199.5787079 199.5787079
5 9/16/2014 197.2535 197.2535 197.2535
7 9/18/2014 195.9107222 195.9107222 195.9107222
8 9/19/2014 200.7785 200.7785 200.7785
10 9/21/2014 191.3220225 191.3220225 191.3220225
12 9/23/2014 179.5676966 179.5676966 179.5676966
13 9/24/2014 180.7201124 180.7201124 180.7201124
15 9/26/2014 170.139382 170.139382 170.139382
17 9/28/2014 171.7347753 171.7347753 171.7347753
18 9/29/2014 180.4120787 180.4120787 180.4120787
20 10/1/2014 221.9926404 221.9926404 221.9926404
22 10/3/2014 177.458764 177.458764 177.458764
23 10/4/2014 171.9423034 171.9423034 171.9423034
25 10/6/2014 195.6371348 195.6371348 195.6371348
27 10/8/2014 190.0867416 190.0867416 190.0867416
28 10/9/2014 171.4321348 171.4321348 171.4321348
30 10/11/2014 174.5272472 174.5272472 174.5272472
32 10/13/2014 198.0153889 198.0153889 198.0153889'''
当前它进行这样的绘图,因为它的编程方式是每个月的第一天将字母关联起来。由于缺少第一个,这就是发生的情况。原始数据在一个csv文件中,我什至试图用所需的日期范围制作df2
df = pd.read_csv('/content/drive/My Drive/Irminger_2020_Project_Colab_Notebooks/Apex_Array/Ready to Graph/profiler/MLD.csv',sep = ',',encoding='utf-8-sig',)
idx = pd.date_range('09-12-2014', '06-29-2019')
df['date'] = pd.to_datetime(df['date'])
s = df
s.index = pd.DatetimeIndex(s.index)
s = s.reindex(idx,)
s
但这似乎无法正常工作,因为它填充了所有内容的NaN
答案 0 :(得分:0)
首先是创建数据框的设置工作。导入数据,设置数据类型(适当的日期,整数,浮点数)。
columns = data.split('\n')[0].split()
records = list()
for record in data.split('\n')[1:]:
records.append((record.split()))
df = (pd.DataFrame(data=records, columns=columns)
.assign(date = lambda x: pd.to_datetime(x['date']))
.set_index('date')
.sort_index()
)
int_fields = ['day']
df[int_fields] = df[int_fields].astype(int)
float_fields = ['tempMLD', 'salinityMLD', 'densityMLD']
df[float_fields] = df[float_fields].astype(float)
现在将日期设置为索引,然后重新索引以消除丢失的日期:
idx = pd.date_range(start=df.index.min(), end=df.index.max())
df = df.reindex(index=idx)
最后,对每一列进行插值以替换NaN值:
for col in df.columns:
df[col] = df[col].interpolate()
现在我们看到9/15,它不在原始数据中:
print(df.head())
day tempMLD salinityMLD densityMLD
2014-09-12 1.0 177.859887 177.859887 177.859887
2014-09-13 2.0 197.261444 197.261444 197.261444
2014-09-14 3.0 199.578708 199.578708 199.578708
2014-09-15 4.0 198.416104 198.416104 198.416104
2014-09-16 5.0 197.253500 197.253500 197.253500