我有一个熊猫数据框:
Name A1 A2 A3
Andy 1 NaN NaN
Brian Nan NaN NaN
Carlos NaN 2 NaN
David NaN Nan 3
Frank 2 Nan Nan
对于每一行,在A1
,A2
和A3
的3列中,最多只有1个单元格不是NaN。因此,我想将它们合并为仅一列,并删除全部为NaN的行。因此,上面的数据帧将变为:
Name A A-ID
Andy 1 1
Carlos 2 2
David 3 3
Frank 2 1
A-ID
将存储原始列(A1,A2或A3)。由于所有3列均为NaN,因此删除了带有Brian
的行。
天真地我可以编写一个for
循环来完成任务,但是有没有更Python化和更快的方法?
谢谢
答案 0 :(得分:3)
此方法应达到预期的结果:
import pandas as pd
import numpy as np
d = {"Name": ["Andy", "Brian", "Carlos", "David", "Frank"],
"A1": [1,np.nan,np.nan,np.nan,2],
"A2": [np.nan,np.nan,2,np.nan,np.nan],
"A3": [np.nan,np.nan,np.nan,3,np.nan]}
df = pd.DataFrame(data=d)
#Drops rows where all A* values are NaN
df = df.dropna(subset = ['A1', 'A2', 'A3'], how="all")
#Sums values to produce result
df["A"] = df.sum(axis=1)
#Alternative method for getting 'A'
#df["A"] = df[["A1", "A2", "A3"]].bfill(axis=1).iloc[:, 0]
#Returns final char of column name of first non-NaN column
df["A-ID"] = df[["A1", "A2", "A3"]].apply(lambda row: row.first_valid_index()[-1], axis=1)
#Dropping old A* columns
df = df.drop(["A1", "A2", "A3"], axis=1)
print(df)
Name A A-ID
0 Andy 1.0 1
2 Carlos 2.0 2
3 David 3.0 3
4 Frank 2.0 1
答案 1 :(得分:1)
有几种方法可以做到这一点。可能最简单的方法是定义一个新列,该列是其他列的总和或串联
df["B"] = df["A1"] + df["A2"] + df["A3"]
然后,您只保留B不为空的行
df = df[df.B.notnull()]
致谢