填写列日期值,直到达到另一个日期值,然后继续填充新达到的值

时间:2018-06-10 10:28:36

标签: python pandas dataframe autofill

我有以下DataFrame:

         Date                 Team 1                Team 2  Score1  Score2
0    1-Oct-17                      1                   NaN       2     NaN
1    1-Oct-17          Chicago Cubs        Cincinnati Reds       1     3.0
2    1-Oct-17    Kansas City Royals   Arizona Diamondbacks       2    14.0
3    1-Oct-17    St.Louis Cardinals      Milwaukee Brewers       1     6.0
4   30-Sep-17                      1                   NaN       2     NaN
5   30-Sep-17     St.Louis Cardinals     Milwaukee Brewers       7     6.0
6   30-Sep-17           Chicago Cubs       Cincinnati Reds       9     0.0
7   30-Sep-17  San Francisco Giants       San Diego Padres       2     3.0
8   30-Sep-17         Boston Red Sox        Houston Astros       6     3.0
9   29-Sep-17                      1                   NaN       2     NaN
10  29-Sep-17           Chicago Cubs       Cincinnati Reds       5     4.0
11  29-Sep-17       New York Yankees     Toronto Blue Jays       4     0.0
12  29-Sep-17    Kansas City Royals         Detroit Tigers       1     4.0
13  29-Sep-17      Chicago White Sox    Los Angeles Angels       5     4.0

我需要填写日期值并替换时间值以获得此结果。

{{1}}

1 个答案:

答案 0 :(得分:1)

您可以检查列Date中的值的长度,如果7更高,则where替换为NaNffill的最后前向填充缺失值(fillna方法ffill):

df['Date'] = df['Date'].where(df['Date'].str.len() > 7).ffill()
#similar idea
#df['Date'] = df['Date'].mask(df['Date'].str.len().isin([4,5])).ffill()
print (df)
         Date                Team 1                Team 2  Score1  Score2
0    1-Oct-17                     1                   NaN       2     NaN
1    1-Oct-17          Chicago Cubs       Cincinnati Reds       1     3.0
2    1-Oct-17    Kansas City Royals  Arizona Diamondbacks       2    14.0
3    1-Oct-17    St.Louis Cardinals     Milwaukee Brewers       1     6.0
4   30-Sep-17                     1                   NaN       2     NaN
5   30-Sep-17    St.Louis Cardinals     Milwaukee Brewers       7     6.0
6   30-Sep-17          Chicago Cubs       Cincinnati Reds       9     0.0
7   30-Sep-17  San Francisco Giants      San Diego Padres       2     3.0
8   30-Sep-17        Boston Red Sox        Houston Astros       6     3.0
9   29-Sep-17                     1                   NaN       2     NaN
10  29-Sep-17          Chicago Cubs       Cincinnati Reds       5     4.0
11  29-Sep-17      New York Yankees     Toronto Blue Jays       4     0.0
12  29-Sep-17    Kansas City Royals        Detroit Tigers       1     4.0
13  29-Sep-17     Chicago White Sox    Los Angeles Angels       5     4.0

另一个想法是将值转换为日期时间并比较0:00次:

from datetime import time

df['Date']  = pd.to_datetime(df['Date'] )
df['Date'] = df['Date'].where(df['Date'].dt.time == time(0,0)).ffill()
print (df)
         Date                Team 1                Team 2  Score1  Score2
0  2017-10-01                     1                   NaN       2     NaN
1  2017-10-01          Chicago Cubs       Cincinnati Reds       1     3.0
2  2017-10-01    Kansas City Royals  Arizona Diamondbacks       2    14.0
3  2017-10-01    St.Louis Cardinals     Milwaukee Brewers       1     6.0
4  2017-09-30                     1                   NaN       2     NaN
5  2017-09-30    St.Louis Cardinals     Milwaukee Brewers       7     6.0
6  2017-09-30          Chicago Cubs       Cincinnati Reds       9     0.0
7  2017-09-30  San Francisco Giants      San Diego Padres       2     3.0
8  2017-09-30        Boston Red Sox        Houston Astros       6     3.0
9  2017-09-29                     1                   NaN       2     NaN
10 2017-09-29          Chicago Cubs       Cincinnati Reds       5     4.0
11 2017-09-29      New York Yankees     Toronto Blue Jays       4     0.0
12 2017-09-29    Kansas City Royals        Detroit Tigers       1     4.0
13 2017-09-29     Chicago White Sox    Los Angeles Angels       5     4.0