Python Pandas组合两行

时间:2018-03-17 07:48:04

标签: python pandas

我有一个大数据集,如下所示:

enter image description here

这种格式有很多行。

查找每个NaN行应该基于NaN的特征。

换句话说,这些行不能直接定位

df [' Computer']

首先需要找到NaN,然后​​返回其行索引以找到这些行。

因此,我想得到:

enter image description here

2 个答案:

答案 0 :(得分:0)

如果多个NaN个连续行,我尝试创建解决方案:

df = pd.DataFrame({'Subjects':['Math','Computer','Science', 'II' , 'Computer','Science1'], 
                   'Students':[10,np.nan, np.nan, 12, np.nan, 12], 
                   'Class':[3, np.nan, np.nan, 5, np.nan, 5]})

print (df)
   Class  Students  Subjects
0    3.0      10.0      Math
1    NaN       NaN  Computer
2    NaN       NaN   Science
3    5.0      12.0        II
4    NaN       NaN  Computer
5    5.0      12.0  Science1

#if always NaNs in both columns Class and Students  
a = pd.Series(range(len(df))).mask(df['Class'].isnull()).bfill()
#if  not always NaNs in both columns Class and Students  
#a = pd.Series(range(len(df))).mask(df[['Class', 'Students']].isnull().all(axis=1)).bfill()
print (a)
0    0.0
1    3.0
2    3.0
3    3.0
4    5.0
5    5.0
dtype: float64

df = (df.groupby(a)
        .agg({'Subjects': ' '.join, 'Class':'last', 'Students':'last'})
        .reset_index(drop=True))
print (df)
              Subjects  Class  Students
0                 Math    3.0      10.0
1  Computer Science II    5.0      12.0
2    Computer Science1    5.0      12.0

答案 1 :(得分:0)

你可以

In [22]: (df.groupby(df[['Students','Class']].isnull().all(1).cumsum())
            .agg({'Subjects': ' '.join, 'Students': 'first', 'Class': 'first'}))
Out[22]:
   Students          Subjects  Class
0      10.0       Mathematics    3.0
1      12.0  Computer Science    5.0
In [23]: df
Out[23]:
      Subjects  Students  Class
0  Mathematics      10.0    3.0
1     Computer       NaN    NaN
2      Science      12.0    5.0