熊猫:使用文本列

时间:2018-03-25 06:06:22

标签: python pandas

如何使用其他列中的前两个字母创建列但不包括NaN?例如。我有3列

a=pd.Series(['Eyes', 'Ear', 'Hair', 'Skin'])

b=pd.Series(['Hair', 'Liver', 'Eyes', 'NaN'])

c=pd.Series(['NaN', 'Skin', 'NaN', 'NaN'])

df=pd.concat([a, b, c], axis=1)

df.columns=['First', 'Second', 'Third']

现在我想创建一个第4列,它将结合来自' First',' Second'和'第三'排序后(使Ear在Hair之前出现而不管列)。但它会跳过NaN值。

第四列的最终输出将如下所示:

Fourth = pd.Series(['EyHa', 'EaLiSk', 'EyHa', 'Sk'])

1 个答案:

答案 0 :(得分:2)

如果NaNnp.nan - 缺少值:

a=pd.Series(['Eyes', 'Ear', 'Hair', 'Skin'])
b=pd.Series(['Hair', 'Liver', 'Eyes', np.nan])
c=pd.Series([np.nan, 'Skin', np.nan, np.nan])
df=pd.concat([a, b, c], axis=1)
df.columns=['First', 'Second', 'Third']

df['new'] = df.apply(lambda x: ''.join(sorted([y[:2] for y in x if pd.notnull(y)])), axis=1)

另一种解决方案:

df['new'] = [''.join([y[:2] for y in x]) for x in np.sort(df.fillna('').values, axis=1)]
#alternative
#df['new'] = [''.join(sorted([y[:2] for y in x if pd.notnull(y)])) for x in df.values]
print (df)

  First Second Third     new
0  Eyes   Hair   NaN    EyHa
1   Ear  Liver  Skin  EaLiSk
2  Hair   Eyes   NaN    EyHa
3  Skin    NaN   NaN      Sk

如果NaNstring

df['new'] = df.apply(lambda x: ''.join(sorted([y[:2] for y in x if y != 'NaN'])), axis=1)

df['new'] = [''.join(sorted([y[:2] for y in x if y != 'NaN'])) for x in df.values]