我有以下天气数据时间序列:
2016
Jan highavg low sum
1 27 21 14 0
2 27 20 14 0
3 26 20 14 0
4 26 21 15 0
5 26 21 17 0
6 26 21 17 0
7 26 20 14 0
8 27 20 14 0
9 25 22 19 0
10 22 19 17 0
11 25 19 13 0
12 24 19 13 0
13 24 19 13 0
14 25 19 14 0
15 26 20 14 0
16 26 20 14 0
17 27 20 13 0
18 26 19 13 0
19 25 19 14 0
20 23 20 17 3.05
21 22 19 16 0
22 20 17 14 0
23 21 17 13 0
24 22 17 11 0
25 23 17 11 0
26 22 16 10 0
27 25 18 11 0
28 18 17 14 0
29 25 19 14 0
30 24 19 13 0
31 26 21 16 0
2016
Feb high avg low sum
1 28 23 18 0
从2016年1月1日至2018年1月1日。
我希望能够创建一个整齐的时间序列数据集,我想每当我进入年度(2016年,2017年,2018年)时创建数据帧,并创建不同的数据帧(每个年度组合每个),然后追加他们。
我对Python很陌生,而且我真的能够提供一些指导,谢谢!
编辑:数据以CSV格式输入
答案 0 :(得分:0)
此代码适用于您的问题。它很安静,但我认为它会帮助你在python和pandas中进行更多的练习。
import pandas as pd
#data collection -> raw data as displayed in your question
data=pd.read_csv("data_slice.csv",header=None, )
lines=data[0].values
#list of new month positions
positions=[i for i,line in enumerate(lines) if ("high" in line)]
#final dataframe preparation
final_df=pd.DataFrame()
for index,pos in enumerate(positions):
#year value in the line above
year=lines[pos-1]
#month value is the first substring, expected spaces
month=list(filter(None, lines[pos].split(" ")))[0]
#subdataframe collections
try:
next_pos=positions[index+1]
sub_df=pd.DataFrame(lines[pos+1:next_pos-1], columns=["col"])
except:
sub_df=pd.DataFrame(lines[pos+1:], columns=["col"])
#format column split in key measures
sub_df['year']=year
sub_df['month']=month
sub_df['col']=sub_df['col'].str.replace(" "," ").str.replace(" "," ")
col_df=pd.DataFrame(sub_df.col.str.split(" ",).tolist(), columns=["empty","day","hi","avr","low","sum"])
temp = pd.concat([col_df['day'], sub_df['year'], sub_df['month'],col_df[["hi","avr","low","sum"]]], axis=1 )
#final dataframe feed
final_df=final_df.append(temp)
print(final_df)