Question

Pandas DataFrame有4列：

     col1    col2   col3    col4
0  orange     NaN    NaN     NaN
1     NaN  tomato    NaN     NaN
2     NaN     NaN  apple     NaN
3     NaN     NaN    NaN  carrot
4     NaN  potato    NaN     NaN

每行只包含一个字符串值，该值可能出现在任何列中。该行中的其他列是NaN。我想创建一个包含字符串值的列：

      col5 
0   orange
1   tomato
2    apple
3   carrot
4   potato

最明显的方法如下：

data['col5'] = data.col1.astype(str) + data.col2.astype(str)...

并删除＆＃34; NaN＆＃34;来自输出字符串，但它很混乱，肯定会导致错误。

Pandas是否提供了这样做的简单方法？

Answer 1

这是一种方式，apply和first_valid_index：

In [11]: df.apply(lambda x: x[x.first_valid_index()], axis=1)
Out[11]:
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object

要有效地获得这些，你可能会陷入困境：

In [21]: df.values.ravel()[np.arange(0, len(df.index) * len(df.columns), len(df.columns)) + np.argmax(df.notnull().values, axis=1)]
Out[21]: array(['orange', 'tomato', 'apple', 'carrot', 'potato'], dtype=object)

注意：如果您拥有所有NaN的行，则两者都会失败，您应该将其过滤掉（例如使用dropna）。

Answer 2

另一种方式（假设每列包含一个字符串值，余数为NaN，而不是"NaN"）将是fillna然后使用max：

>>> df.fillna('').max(axis=1)
0    orange
1    tomato
2     apple
3    carrot
4    potato
dtype: object

Answer 3

跨行元素映射过滤函数应该这样做。

data['new_col'] = list(data.apply(lambda row: filter(lambda elem: not pd.isnull(elem), row)[0]))

使用每行中的非空值创建新列

3 个答案: