在熊猫中融化多列

时间:2020-10-16 14:41:05

标签: pandas

我一直在尝试将pd.melt与以下数据框一起使用。

    MRN  Name        Dt1   Nam1 Loc1        Dt2 Nam2 Loc2        Dt3 Nam3 Loc3
0  1234  John 2010-01-01    CMV  Eye 2010-02-10  RSV  Res 2010-03-10  HSV  Eye
1  1245   Joe 2011-06-10  Cdiff   GI        NaT  NaN  NaN        NaT  NaN  NaN
2  1235  Mary 2012-05-06  Ecoli  Bld        NaT  NaN  NaN        NaT  NaN  NaN
3  1254  Matt        NaT    NaN  NaN        NaT  NaN  NaN        NaT  NaN  NaN

获得如下输出

    MRN  Name         Dt    Nam  Loc
0  1234  John 2010-01-01    CMV  Eye
1  1234  John 2010-02-10    RSV  Res
2  1234  John 2010-03-10    HSV  Eye
3  1245   Joe 2011-06-10  Cdiff   GI
4  1235  Mary 2012-05-06  Ecoli  Bld
5  1254  Matt        NaT    NaN  NaN

我无法做到这一点。

2 个答案:

答案 0 :(得分:0)

您可以在不使用pd.melt的情况下进行此操作,方法是准备每组列,然后使用pd.concat连接它们:

dfs = []
for i in range(1, 4):
    tmp_df = df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]]
    tmp_df = df.rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"})
    dfs.append(tmp_df.dropna())  # dropna to remove rows with NaN.

df = pd.concat(dfs)

或者,如果您希望将其用作很长的衬垫:

df = pd.concat([df[["MRN", "Name", f"Dt{i}", f"Nam{i}", f"Loc{i}"]].rename(columns={f"Dt{i}": "Dt", f"Name{i}": "Nam", f"Loc{i}": "Loc"}).dropna() for i in range(1, 4)])

答案 1 :(得分:0)

您可能必须对过滤进行硬编码以匹配您的预期输出:

(
    pd.wide_to_long(df, stubnames=["Dt", "Nam", "Loc"], i=["MRN", "Name"], j="num")
    .reset_index()
    .sort_values(["Dt", "num"])
    .drop('num', 1)
    .loc[:9]
)


     MRN    Name        Dt           Nam    Loc
0   1234    John        2010-01-01  CMV     Eye
1   1234    John        2010-02-10  RSV     Res
2   1234    John        2010-03-10  HSV     Eye
3   1245    Joe         2011-06-10  Cdiff   GI
6   1235    Mary        2012-05-06  Ecoli   Bld
9   1254    Matt        NaN         NaN     NaN