基于Month2
列,我需要确定Forecast_for
和Forecast_in
列的值。我编写了以下代码来检查单词forecast
和in
的位置;然后使用str
函数提取相关值来解决此问题。但是我只得到NaN
值。有人可以帮忙吗?请让我知道是否有更好的方法。目的是最终将Forecast_for/Forecast_in
列转换为数字年份和月份,例如December 2018/19
最终将变成Forecast_for_Year = 2018
和Forecast_for_Month = 12
。
谢谢!
data = {'Month2': ['December 2018/19 forecast in November 2018/19', 'January 2018/19 forecast in November 2018/19', 'March 2018/19 forecast in November 2018/19', 'June 2019/20 forecast in May 2019/20'],
'len_month2':['','','',''] ,
'pos_forecast': ['','','',''],
'pos_in': ['','','',''],
'Forecast_for': ['','','',''],
'Forecast_in': ['','','',''],
'Forecast_for_Year': ['','','',''],
'Forecast_for_Month': ['','','',''],
'Forecast_in_Year': ['','','',''],
'Forecast_in_Month': ['','','','']}
df = pd.DataFrame(data, columns = ['Month2', 'len_month2', 'pos_forecast', 'pos_in', 'Forecast_for', 'Forecast_in',
'Forecast_for_Year', 'Forecast_for_Month', 'Forecast_in_Year', 'Forecast_in_Month'])
#Calculate Forecast_for
df['pos_forecast'] = df['Month2'].str.find('forecast')
df['Forecast_for'] = df['Month2'].str[:df['pos_forecast']]
#Calculate Forecast_in
df['pos_in'] = df['Month2'].str.find('in')
df['len_month2'] = df['Month2'].str.len()
df['Forecast_in'] = df['Month2'].str[(df['len_month2'] - df['pos_in']):]
df
答案 0 :(得分:2)
您可以使用以下内容提取Forecast_for
和Forecast_in
df['Forecast_for'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+)')
df['Forecast_in'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+$)')
更新
df.Forecast_in_Year = pd.to_datetime(df.Forecast_in).dt.year
df.Forecast_in_Month = pd.to_datetime(df.Forecast_in).dt.month
df.Forecast_for_Year = pd.to_datetime(df.Forecast_for).dt.year
df.Forecast_for_Month = pd.to_datetime(df.Forecast_for).dt.month
输出
Month2 len_month2 pos_forecast pos_in Forecast_for Forecast_in Forecast_for_Year Forecast_for_Month Forecast_in_Year Forecast_in_Month
0 December 2018/19 forecast in November 2018/19 45 17 26 December 2018/19 November 2018/19 2018 12 2018 11
1 January 2018/19 forecast in November 2018/19 44 16 25 January 2018/19 November 2018/19 2018 1 2018 11
2 March 2018/19 forecast in November 2018/19 42 14 23 March 2018/19 November 2018/19 2018 3 2018 11
3 June 2019/20 forecast in May 2019/20 36 13 22 June 2019/20 May 2019/20 2019 6 2019 5
答案 1 :(得分:1)
您也可以尝试拆分字符串。
df['Forecast_for_Month'] = df['Month2'].str.split().str[0].
df['Forecast_in_Month'] = df['Month2'].str.split().str[4]
df['Forecast_for_Year'] = df['Month2'].str.split().str[1]
df['Forecast_in_Year'] = df['Month2'].str.split().str[5]