我有一个大数据集,如下所示:
这种格式有很多行。
查找每个NaN行应该基于NaN的特征。
换句话说,这些行不能直接定位
df [' Computer']
首先需要找到NaN,然后返回其行索引以找到这些行。
因此,我想得到:
答案 0 :(得分:0)
如果多个NaN
个连续行,我尝试创建解决方案:
df = pd.DataFrame({'Subjects':['Math','Computer','Science', 'II' , 'Computer','Science1'],
'Students':[10,np.nan, np.nan, 12, np.nan, 12],
'Class':[3, np.nan, np.nan, 5, np.nan, 5]})
print (df)
Class Students Subjects
0 3.0 10.0 Math
1 NaN NaN Computer
2 NaN NaN Science
3 5.0 12.0 II
4 NaN NaN Computer
5 5.0 12.0 Science1
#if always NaNs in both columns Class and Students
a = pd.Series(range(len(df))).mask(df['Class'].isnull()).bfill()
#if not always NaNs in both columns Class and Students
#a = pd.Series(range(len(df))).mask(df[['Class', 'Students']].isnull().all(axis=1)).bfill()
print (a)
0 0.0
1 3.0
2 3.0
3 3.0
4 5.0
5 5.0
dtype: float64
df = (df.groupby(a)
.agg({'Subjects': ' '.join, 'Class':'last', 'Students':'last'})
.reset_index(drop=True))
print (df)
Subjects Class Students
0 Math 3.0 10.0
1 Computer Science II 5.0 12.0
2 Computer Science1 5.0 12.0
答案 1 :(得分:0)
你可以
In [22]: (df.groupby(df[['Students','Class']].isnull().all(1).cumsum())
.agg({'Subjects': ' '.join, 'Students': 'first', 'Class': 'first'}))
Out[22]:
Students Subjects Class
0 10.0 Mathematics 3.0
1 12.0 Computer Science 5.0
In [23]: df
Out[23]:
Subjects Students Class
0 Mathematics 10.0 3.0
1 Computer NaN NaN
2 Science 12.0 5.0