我有一个熊猫数据集,在这里我试图关联两列...一个(df ['IssueDatetime'])正确地格式化为日期时间,另一个仅具有%dd /%HH(df ['forecastTime']):
IssueDatetime Regions forecastTime WindDirSpeed
0 2019-01-01 06:00:00 EAST COAST 01/06 NW25
1 2019-01-01 06:00:00 EAST COAST 01/15 SW15
2 2019-01-01 06:00:00 EAST COAST 02/00 SE25
3 2019-01-01 06:00:00 EAST COAST 02/06 SE35-45
4 2019-01-01 06:00:00 EAST COAST 02/15 SW40
... ... ... ... ...
12292 2019-12-30 06:00:00 SOUTHEASTERN GRAND BANKS 01/00 N15-20
12293 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 30/06 NW15-20
12294 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 31/00 N25
12295 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 31/15 N15-20
12296 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 01/00 VRB10-15
是否可以将df ['IssueDatetime']与df ['forecastTime']相关联,以使结果如下:
IssueDatetime Regions forecastTime WindDirSpeed
0 2019-01-01 06:00:00 EAST COAST 2019-01-01 06:00:00 NW25
1 2019-01-01 06:00:00 EAST COAST 2019-01-01 15:00:00 SW15
2 2019-01-01 06:00:00 EAST COAST 2019-01-02 00:00:00 SE25
3 2019-01-01 06:00:00 EAST COAST 2019-01-02 06:00:00 SE35-45
在月底关联列时会出现问题。任何建议都会有所帮助。
答案 0 :(得分:0)
尝试一下:
df['IssueDatetime'] = pd.to_datetime(df['IssueDatetime'])
df['forecastTime'] = pd.to_datetime(df['forecastTime'], format='%d/%H')
df['forecastTime'] = df['forecastTime'].astype(str).str.replace('1900', '2019')
print(df)
IssueDatetime Regions forecastTime WindDirSpeed
0 2019-01-01 06:00:00 EAST COAST 2019-01-01 06:00:00 NW25
1 2019-01-01 06:00:00 EAST COAST 2019-01-01 15:00:00 SW15
2 2019-01-01 06:00:00 EAST COAST 2019-01-02 00:00:00 SE25
3 2019-01-01 06:00:00 EAST COAST 2019-01-02 06:00:00 SE35-45
4 2019-01-01 06:00:00 EAST COAST 2019-01-02 15:00:00 SW40
5 2019-12-30 06:00:00 SOUTHEASTERN GRAND BANKS 2019-01-01 00:00:00 N15-20
6 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 2019-01-30 06:00:00 NW15-20
7 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 2019-01-31 00:00:00 N25
8 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 2019-01-31 15:00:00 N15-20
9 2019-12-30 06:00:00 SOUTHWESTERN GRAND BANKS 2019-01-01 00:00:00 VRB10-15
答案 1 :(得分:0)
这与先前的答案类似,但有2处修改:
timestamps
,但天和小时值已替换relativedelta
可以确保将预测延续到下个月(我假设这就是您想要的?)import pandas as pd
from dateutil.relativedelta import relativedelta
#replicating your data
issuetimes = ['2019-01-01 06:00:00']*5 + ['2019-12-30 06:00:00']*5
forecasts = ['01/06','01/15','02/00','02/06','02/15',
'01/00','30/06','31/00','31/15','01/00',]
def replace_days_hours(row):
row['forecastTime'] = row['IssueDatetime'].replace(day=row['forecastTime'].day,
hour=row['forecastTime'].hour,)
if row['forecastTime'] < row['IssueDatetime']:
row['forecastTime'] += relativedelta(months=1)
return row
df = pd.DataFrame({'IssueDatetime':issuetimes,'forecastTime':forecasts})
df['IssueDatetime'] = pd.to_datetime(df['IssueDatetime'])
df['forecastTime'] = pd.to_datetime(df['forecastTime'], format='%d/%H')
df = df.apply(replace_days_hours,axis=1)
输出:
IssueDatetime forecastTime
0 2019-01-01 06:00:00 2019-01-01 06:00:00
1 2019-01-01 06:00:00 2019-01-01 15:00:00
2 2019-01-01 06:00:00 2019-01-02 00:00:00
3 2019-01-01 06:00:00 2019-01-02 06:00:00
4 2019-01-01 06:00:00 2019-01-02 15:00:00
5 2019-12-30 06:00:00 2020-01-01 00:00:00
6 2019-12-30 06:00:00 2019-12-30 06:00:00
7 2019-12-30 06:00:00 2019-12-31 00:00:00
8 2019-12-30 06:00:00 2019-12-31 15:00:00
9 2019-12-30 06:00:00 2020-01-01 00:00:00