我有一个数据框,其中包含多个空行:
date hour Temp
6/1/2017 0:00 64
6/7/2017 22:00 63
6/7/2017 23:00 62
6/2/2017 0:00 62
6/2/2017 1:00 60
6/8/2017 23:00 65
6/6/2017 0:00 64
6/6/2017 1:00 64
6/12/2017 22:00 78
6/12/2017 23:00 76
我想创建以下内容:
date hour Temp newDate
6/1/2017 0:00 64 6/1/2017
6/7/2017 22:00 63 6/1/2017
6/7/2017 23:00 62 6/1/2017
6/2/2017 0:00 62 6/2/2017
6/2/2017 1:00 60 6/2/2017
6/8/2017 23:00 65 6/2/2017
6/6/2017 0:00 64 6/6/2017
6/6/2017 1:00 64 6/6/2017
6/12/2017 22:00 78 6/6/2017
6/12/2017 23:00 76 6/6/2017
在空白行之后的date
列的第一个日期创建新列的位置。
我正在尝试创建for循环但是有更好的方法吗?
答案 0 :(得分:1)
itertools.groupby
的解决方案。我假设您的空白行包含NaN
个项目,并利用np.nan == np.nan
返回False
的事实。
from itertools import groupby, chain
# group by items being NaN
grouper = groupby(df['date'], key=lambda x: x==x)
# extract first item, multiply and chain
chainer = chain.from_iterable([next(j)]*(len(list(j))+1) for _, j in grouper)
# assign to new series
df['newDate'] = list(chainer)
print(df)
date hour Temp newDate
0 NaN NaN NaN NaN
1 6/1/2017 0:00 64.0 6/1/2017
2 6/7/2017 22:00 63.0 6/1/2017
3 6/7/2017 23:00 62.0 6/1/2017
4 NaN NaN NaN NaN
5 6/2/2017 0:00 62.0 6/2/2017
6 6/2/2017 1:00 60.0 6/2/2017
7 6/8/2017 23:00 65.0 6/2/2017
8 NaN NaN NaN NaN
9 6/6/2017 0:00 64.0 6/6/2017
10 6/6/2017 1:00 64.0 6/6/2017
11 6/12/2017 22:00 78.0 6/6/2017
12 6/12/2017 23:00 76.0 6/6/2017