我有一个多列且缺少数据的数据框:
Unit# Mile Direction
1 of 2 NaN NaN
2 of 2 228.7mi NaN
1 of 2 NaN NaN
2 of 2 229.7mi NaN
1 of 2 NaN NaN
2 of 2 228.7mi NaN
1 of 3 NaN NaN
2 of 3 227.7mi NaN
3 of 3 NaN NaN
我想做两件事,
理想的输出为:
Unit# Mile Direction
1 of 2 228.7mi Up
2 of 2 228.7mi Up
1 of 2 229.7mi Up
2 of 2 229.7mi Up
1 of 2 228.7mi Down
2 of 2 228.7mi Down
1 of 3 227.7mi Down
2 of 3 227.7mi Down
3 of 3 227.7mi Down
我的主要问题是:
答案 0 :(得分:3)
使用cumcount
和cumsum
创建组密钥
s = df.groupby(['Unit#']).cumcount().diff().ne(0).cumsum()
s
Out[606]:
0 1
1 1
2 2
3 2
4 3
5 3
6 4
7 4
8 4
dtype: int32
然后我们用Mile来做fillna
df.Mile=df.Mile.groupby(s).apply(lambda x : x.ffill().bfill())
s1=pd.to_numeric(df.Mile.str[:-2]).diff().fillna(1)
df.loc[s1>0,'Direction']='Up'
df.loc[s1<0,'Direction']='Down'
df.Direction=df.Direction.ffill()
***Yield:***
df
Out[622]:
Unit# Mile Direction
0 1of2 228.7mi Up
1 2of2 228.7mi Up
2 1of2 229.7mi Up
3 2of2 229.7mi Up
4 1of2 228.7mi Down
5 2of2 228.7mi Down
6 1of3 227.7mi Down
7 2of3 227.7mi Down
8 3of3 227.7mi Down