我一直在尝试将pd.melt与以下数据框一起使用。
MRN Name Dt1 Nam1 Loc1 Dt2 Nam2 Loc2 Dt3 Nam3 Loc3
0 1234 John 2010-01-01 CMV Eye 2010-02-10 RSV Res 2010-03-10 HSV Eye
1 1245 Joe 2011-06-10 Cdiff GI NaT NaN NaN NaT NaN NaN
2 1235 Mary 2012-05-06 Ecoli Bld NaT NaN NaN NaT NaN NaN
3 1254 Matt NaT NaN NaN NaT NaN NaN NaT NaN NaN
获得如下输出
MRN Name Dt Nam Loc
0 1234 John 2010-01-01 CMV Eye
1 1234 John 2010-02-10 RSV Res
2 1234 John 2010-03-10 HSV Eye
3 1245 Joe 2011-06-10 Cdiff GI
4 1235 Mary 2012-05-06 Ecoli Bld
5 1254 Matt NaT NaN NaN
我无法做到这一点。
答案 0 :(得分:0)
您可以在不使用pd.melt
的情况下进行此操作,方法是准备每组列,然后使用pd.concat
连接它们:
dfs = []
for i in range(1, 4):
tmp_df = df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]]
tmp_df = df.rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"})
dfs.append(tmp_df.dropna()) # dropna to remove rows with NaN.
df = pd.concat(dfs)
或者,如果您希望将其用作很长的衬垫:
df = pd.concat([df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]].rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"}).dropna() for i in range(1, 4)])
答案 1 :(得分:0)
您可能必须对过滤进行硬编码以匹配您的预期输出:
(
pd.wide_to_long(df, stubnames=["Dt", "Nam", "Loc"], i=["MRN", "Name"], j="num")
.reset_index()
.sort_values(["Dt", "num"])
.drop('num', 1)
.loc[:9]
)
MRN Name Dt Nam Loc
0 1234 John 2010-01-01 CMV Eye
1 1234 John 2010-02-10 RSV Res
2 1234 John 2010-03-10 HSV Eye
3 1245 Joe 2011-06-10 Cdiff GI
6 1235 Mary 2012-05-06 Ecoli Bld
9 1254 Matt NaN NaN NaN