熊猫-根据另一列填充一个数据框列

时间:2020-01-04 14:45:41

标签: python pandas

基于Month2列,我需要确定Forecast_forForecast_in列的值。我编写了以下代码来检查单词forecastin的位置;然后使用str函数提取相关值来解决此问题。但是我只得到NaN值。有人可以帮忙吗?请让我知道是否有更好的方法。目的是最终将Forecast_for/Forecast_in列转换为数字年份和月份,例如December 2018/19最终将变成Forecast_for_Year = 2018Forecast_for_Month = 12

谢谢!

data = {'Month2': ['December 2018/19 forecast in November 2018/19', 'January 2018/19 forecast in November 2018/19', 'March 2018/19 forecast in November 2018/19', 'June 2019/20 forecast in May 2019/20'],
        'len_month2':['','','',''] ,
        'pos_forecast': ['','','',''],
        'pos_in': ['','','',''],
       'Forecast_for': ['','','',''],
       'Forecast_in': ['','','',''],
       'Forecast_for_Year': ['','','',''],
       'Forecast_for_Month': ['','','',''],       
       'Forecast_in_Year': ['','','',''],
       'Forecast_in_Month': ['','','','']}
df = pd.DataFrame(data, columns = ['Month2', 'len_month2', 'pos_forecast', 'pos_in', 'Forecast_for', 'Forecast_in', 
                                   'Forecast_for_Year', 'Forecast_for_Month', 'Forecast_in_Year', 'Forecast_in_Month'])

#Calculate Forecast_for
df['pos_forecast'] = df['Month2'].str.find('forecast')
df['Forecast_for'] = df['Month2'].str[:df['pos_forecast']]

#Calculate Forecast_in
df['pos_in'] = df['Month2'].str.find('in')
df['len_month2'] = df['Month2'].str.len()
df['Forecast_in'] = df['Month2'].str[(df['len_month2'] - df['pos_in']):]
df

2 个答案:

答案 0 :(得分:2)

您可以使用以下内容提取Forecast_forForecast_in

df['Forecast_for'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+)')
df['Forecast_in'] = df['Month2'].str.extract(r'(\w+\s[\d\/]+$)')

更新

df.Forecast_in_Year = pd.to_datetime(df.Forecast_in).dt.year
df.Forecast_in_Month = pd.to_datetime(df.Forecast_in).dt.month
df.Forecast_for_Year = pd.to_datetime(df.Forecast_for).dt.year
df.Forecast_for_Month = pd.to_datetime(df.Forecast_for).dt.month

输出

    Month2  len_month2  pos_forecast    pos_in  Forecast_for    Forecast_in Forecast_for_Year   Forecast_for_Month  Forecast_in_Year    Forecast_in_Month
0   December 2018/19 forecast in November 2018/19   45  17  26  December 2018/19    November 2018/19    2018    12  2018    11
1   January 2018/19 forecast in November 2018/19    44  16  25  January 2018/19 November 2018/19    2018    1   2018    11
2   March 2018/19 forecast in November 2018/19  42  14  23  March 2018/19   November 2018/19    2018    3   2018    11
3   June 2019/20 forecast in May 2019/20    36  13  22  June 2019/20    May 2019/20 2019    6   2019    5

答案 1 :(得分:1)

您也可以尝试拆分字符串。

    df['Forecast_for_Month'] = df['Month2'].str.split().str[0].
    df['Forecast_in_Month'] = df['Month2'].str.split().str[4]
    df['Forecast_for_Year'] = df['Month2'].str.split().str[1]
    df['Forecast_in_Year'] = df['Month2'].str.split().str[5]