将DataFrame与大量NaN合并

时间:2017-10-30 02:55:41

标签: python pandas

嗨,我有Pandas数据框,如下所示:

      0            1    2         3            4     5     6
150  NaN          NaN  NaN       NaN   March 1980   NaN   NaN
151  NaN          NaN  NaN       NaN    June 1990   NaN   NaN
152  NaN          NaN  NaN  Sep 2015          NaN   NaN   NaN
153  NaN          NaN  NaN  Jan 1972          NaN   NaN   NaN
154  NaN          NaN  NaN  Mar 1974          NaN   NaN   NaN

我不能使用dropna(),因为我将有一个空数据帧。

所有列在一列中都有一个数据,有没有办法在一列DataFrame中对其进行转换?

         0           
150  March 1980
151  June 1990
152  Sep 2015
153  Jan 1972
154  Mar 1974

感谢。

3 个答案:

答案 0 :(得分:4)

尝试

df = df.fillna('').sum(1)

df = df.fillna('').apply(''.join, axis = 1)

你得到了

150    March 1980
151     June 1990
152      Sep 2015
153      Jan 1972
154      Mar 1974
dtype: object

答案 1 :(得分:3)

这是你想要的吗?

df.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
Out[1052]: 
             0
150  March1980
151   June1990
152    Sep2015
153    Jan1972
154    Mar1974

df.bfill(1).iloc[:,0]
Out[1056]: 
150    March1980
151     June1990
152      Sep2015
153      Jan1972
154      Mar1974
Name: 0, dtype: object

df.stack()
Out[1058]: 
150  4    March1980
151  4     June1990
152  3      Sep2015
153  3      Jan1972
154  3      Mar1974
dtype: object

答案 2 :(得分:1)

另一个解决方案是在布尔掩码和pd.notnull的帮助下,这比sumsorted要快得多,即

sdf = pd.DataFrame(df.values[pd.notnull(df)],index=df.index)

输出:

            0
150  March1980
151   June1990
152    Sep2015
153    Jan1972
154    Mar1974
ndf = pd.concat([df.reset_index(drop=True)]*1000)

%%timeit
ndf.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
1 loop, best of 3: 1.29 s per loop

%%timeit
ndf.bfill(1).iloc[:,0]
1 loop, best of 3: 773 ms per loop

%%timeit
ndf.fillna('').sum(1)
10 loops, best of 3: 26.4 ms per loop

%%timeit
pd.DataFrame(ndf.values[pd.notnull(ndf)],index=ndf.index)
100 loops, best of 3: 3.11 ms per loop