嗨,我有Pandas数据框,如下所示:
0 1 2 3 4 5 6
150 NaN NaN NaN NaN March 1980 NaN NaN
151 NaN NaN NaN NaN June 1990 NaN NaN
152 NaN NaN NaN Sep 2015 NaN NaN NaN
153 NaN NaN NaN Jan 1972 NaN NaN NaN
154 NaN NaN NaN Mar 1974 NaN NaN NaN
我不能使用dropna(),因为我将有一个空数据帧。
所有列在一列中都有一个数据,有没有办法在一列DataFrame中对其进行转换?
0
150 March 1980
151 June 1990
152 Sep 2015
153 Jan 1972
154 Mar 1974
感谢。
答案 0 :(得分:4)
尝试
df = df.fillna('').sum(1)
或
df = df.fillna('').apply(''.join, axis = 1)
你得到了
150 March 1980
151 June 1990
152 Sep 2015
153 Jan 1972
154 Mar 1974
dtype: object
答案 1 :(得分:3)
df.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
Out[1052]:
0
150 March1980
151 June1990
152 Sep2015
153 Jan1972
154 Mar1974
或
df.bfill(1).iloc[:,0]
Out[1056]:
150 March1980
151 June1990
152 Sep2015
153 Jan1972
154 Mar1974
Name: 0, dtype: object
或
df.stack()
Out[1058]:
150 4 March1980
151 4 June1990
152 3 Sep2015
153 3 Jan1972
154 3 Mar1974
dtype: object
答案 2 :(得分:1)
另一个解决方案是在布尔掩码和pd.notnull的帮助下,这比sum
和sorted
要快得多,即
sdf = pd.DataFrame(df.values[pd.notnull(df)],index=df.index)
输出:
0 150 March1980 151 June1990 152 Sep2015 153 Jan1972 154 Mar1974
ndf = pd.concat([df.reset_index(drop=True)]*1000)
%%timeit
ndf.apply(lambda x : sorted(x,key=pd.isnull),axis=1).dropna(1)
1 loop, best of 3: 1.29 s per loop
%%timeit
ndf.bfill(1).iloc[:,0]
1 loop, best of 3: 773 ms per loop
%%timeit
ndf.fillna('').sum(1)
10 loops, best of 3: 26.4 ms per loop
%%timeit
pd.DataFrame(ndf.values[pd.notnull(ndf)],index=ndf.index)
100 loops, best of 3: 3.11 ms per loop