我有一个数据框(从Excel导入),如下所示:
Date Period
0 2017-03-02 2017-03-01 00:00:00
1 2017-03-02 2017-04-01 00:00:00
2 2017-03-02 2017-05-01 00:00:00
3 2017-03-02 2017-06-01 00:00:00
4 2017-03-02 2017-07-01 00:00:00
5 2017-03-02 2017-08-01 00:00:00
6 2017-03-02 2017-09-01 00:00:00
7 2017-03-02 2017-10-01 00:00:00
8 2017-03-02 2017-11-01 00:00:00
9 2017-03-02 2017-12-01 00:00:00
10 2017-03-02 Q217
11 2017-03-02 Q317
12 2017-03-02 Q417
13 2017-03-02 Q118
14 2017-03-02 Q218
15 2017-03-02 Q318
16 2017-03-02 Q418
17 2017-03-02 2018
我正在尝试将所有'Period'列转换为一致的格式。一些元素看起来已经是日期时间格式,其他元素转换为字符串(例如Q217),其他元素转换为int(ex 2018)。哪个是在datetime中转换所有内容的最快方法?我试着用一些掩饰,像这样:
mask = df['Period'].str.startswith('Q', na = False)
list_quarter = df_final[mask]['Period'].tolist()
quarter_convert = {'1':'31/03', '2':'30/06', '3':'31/08', '4':'30/12'}
counter = 0
for element in list_quarter:
element = element[1:]
quarter = element[0]
year = element[1:]
daymonth = ''.join(str(quarter_convert.get(word, word)) for word in quarter)
final = daymonth+'/'+year
list_quarter[counter] = final
counter+=1
但是当我尝试替换原始列中的修改元素时,它失败了:
df_nwe_final['Period'] = np.where(mask, pd.Series(list_quarter), df_nwe_final['Period'])
当然,我需要对2018类型格式做大致相同的操作。但是,我确信我在这里遗漏了一些东西,应该有一个更快的解决方案。你的一些新想法会有所帮助!谢谢。
答案 0 :(得分:1)
重用您显示的代码,让我们先编写一个函数,将Q
- 字符串转换为日期时间格式(我调整为最终格式):
def convert_q_string(element):
quarter_convert = {'1':'03-31', '2':'06-30', '3':'08-31', '4':'12-30'}
element = element[1:]
quarter = element[0]
year = element[1:]
daymonth = ''.join(str(quarter_convert.get(word, word)) for word in quarter)
final = '20' + year + '-' + daymonth
return final
我们现在可以使用它来首先转换所有Q' -strings,然后pd.to_datetime
将所有元素转换为正确的日期时间值:
In [2]: s = pd.Series(['2017-03-01 00:00:00', 'Q217', '2018'])
In [3]: mask = s.str.startswith('Q')
In [4]: s[mask] = s[mask].map(convert_q_string)
In [5]: s
Out[5]:
0 2017-03-01 00:00:00
1 2017-06-30
2 2018
dtype: object
In [6]: pd.to_datetime(s)
Out[6]:
0 2017-03-01
1 2017-06-30
2 2018-01-01
dtype: datetime64[ns]