Python Pandas:将多行转换为单行,忽略NaN

时间:2016-06-16 12:18:49

标签: python python-2.7 pandas dataframe

我有DataFrame类似于下面提到的那个,

 Age    Sex    Name ....
 12     NaN    NaN
 NaN    Male   NaN
 NaN    NaN    David

我想将它转换为一行数据帧,忽略NaN并合并它们

 Age    Sex    Name
 12     Male   David

如何做到这一点是熊猫?

3 个答案:

答案 0 :(得分:3)

您可以使用pd.concatcolumns.dropna()之后的所有.reset_index()合并为:

pd.concat([df[col].dropna().reset_index(drop=True) for col in df], axis=1)

得到:

    Age   Sex   Name
0  12.0  Male  David

答案 1 :(得分:1)

另一种方法是apply一个lambda,它调用first_valid_index来返回第一个有效的行值:

In [246]:
df.apply(lambda x: pd.Series(x[x.first_valid_index()]))

Out[246]:
    Age   Sex   Name
0  12.0  Male  David

答案 2 :(得分:0)

这很讨厌。熊猫不会自动重塑索引; /。所以你必须做很少的操作。 Dunno哪一个最好:

import numpy as np,pandas as pd

df= '''
 12     NaN    NaN
 NaN    Male   NaN
 NaN    NaN    David'''

df = np.array(df.split())

df.shape=(3,3)

df = pd.DataFrame(df,columns='Age   Sex   Name'.split())
df.replace('NaN',np.nan,True)

def func(x):
    x.dropna(inplace=True)
    x.reset_index(inplace=True,drop=True)
    #s=pd.Series(vals,index=range(vals.shape[0]))
    #print vals.shape
    #print x.shape
    return x

def func1(x):
    x=x.dropna().values
    idx=range(x.shape[0])
    x=pd.Series(x,index=idx)
    #print vals.shape
    #print x.shape
    return x

def func2(x):
    idx=x.first_valid_index()
    x=x[idx]
    x=pd.Series(x)
    return x

print '#'*20
print df
print '#'*20
print 1,df.apply(func,axis=0)
print '#'*20
print 2,df.apply(func1,axis=0)
print '#'*20
print 3,df.apply(func2,axis=0)
print '#'*20
print 3,pd.DataFrame({colId: df[colId].dropna().values for colId in df})

'''
output:

####################
   Age   Sex   Name
0   12   NaN    NaN
1  NaN  Male    NaN
2  NaN   NaN  David
####################
1   Age   Sex   Name
0  12  Male  David
####################
2   Age   Sex   Name
0  12  Male  David
####################
3   Age   Name   Sex
0  12  David  Male

'''