我有DataFrame
类似于下面提到的那个,
Age Sex Name ....
12 NaN NaN
NaN Male NaN
NaN NaN David
我想将它转换为一行数据帧,忽略NaN并合并它们
Age Sex Name
12 Male David
如何做到这一点是熊猫?
答案 0 :(得分:3)
您可以使用pd.concat
将columns
和.dropna()
之后的所有.reset_index()
合并为:
pd.concat([df[col].dropna().reset_index(drop=True) for col in df], axis=1)
得到:
Age Sex Name
0 12.0 Male David
答案 1 :(得分:1)
另一种方法是apply
一个lambda,它调用first_valid_index
来返回第一个有效的行值:
In [246]:
df.apply(lambda x: pd.Series(x[x.first_valid_index()]))
Out[246]:
Age Sex Name
0 12.0 Male David
答案 2 :(得分:0)
这很讨厌。熊猫不会自动重塑索引; /。所以你必须做很少的操作。 Dunno哪一个最好:
import numpy as np,pandas as pd
df= '''
12 NaN NaN
NaN Male NaN
NaN NaN David'''
df = np.array(df.split())
df.shape=(3,3)
df = pd.DataFrame(df,columns='Age Sex Name'.split())
df.replace('NaN',np.nan,True)
def func(x):
x.dropna(inplace=True)
x.reset_index(inplace=True,drop=True)
#s=pd.Series(vals,index=range(vals.shape[0]))
#print vals.shape
#print x.shape
return x
def func1(x):
x=x.dropna().values
idx=range(x.shape[0])
x=pd.Series(x,index=idx)
#print vals.shape
#print x.shape
return x
def func2(x):
idx=x.first_valid_index()
x=x[idx]
x=pd.Series(x)
return x
print '#'*20
print df
print '#'*20
print 1,df.apply(func,axis=0)
print '#'*20
print 2,df.apply(func1,axis=0)
print '#'*20
print 3,df.apply(func2,axis=0)
print '#'*20
print 3,pd.DataFrame({colId: df[colId].dropna().values for colId in df})
'''
output:
####################
Age Sex Name
0 12 NaN NaN
1 NaN Male NaN
2 NaN NaN David
####################
1 Age Sex Name
0 12 Male David
####################
2 Age Sex Name
0 12 Male David
####################
3 Age Name Sex
0 12 David Male
'''