按列(ID)的数据透视表数据重复

时间:2020-03-03 08:28:54

标签: python pandas dataframe pivot melt

我有一个带有名为“ ID”的列的DataFrame,该列具有重复的观察值。每个“ ID”行都有一个或多个“ Article”值列。我想通过'ID'转置整个数据帧分组,在唯一的'ID'的同一行添加新列。顺序很重要

我的数据框:

ID  Article     Article_2
1   Banana      NaN
2   Apple       NaN
1   Apple       Coconut
3   Tomatoe     Coconut
1   Pineapple   Tropical
2   Banana      Coconut
4   Apple       Coconut
5   Apple       Coconut
3   Apple       Pineapple

我的代码(@Erfan提供):

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])
dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

输出:

        Article_1   Article_2   Article_3   Article_4   Article_5   Article_6
0001    Banana      Apple       Pineapple   NaN         Coconut     Tropical
0002    Apple       Banana      NaN         Coconut     NaN         NaN
0003    Tomatoe     Apple       Coconut     Pineapple   NaN         NaN
0004    Apple       Coconut     NaN         NaN         NaN         NaN
0005    Apple       Coconut     NaN         NaN         NaN         NaN

第一行的Article_4为NaN,Article_5和6具有值。应该是Article_4椰子,Article_5热带和Article_6 NaN。 同时,Article_3为NaN,Article_4为有效值。应该是Article_3有效,其余(4,5,6)个NaNs

所需的输出:

        Article_1   Article_2   Article_3   Article_4   Article_5   Article_6
0001    Banana      Apple       Pineapple   Coconut     Tropical    NaN     
0002    Apple       Banana      Coconut     NaN         NaN         NaN
0003    Tomatoe     Apple       Coconut     Pineapple   NaN         NaN
0004    Apple       Coconut     NaN         NaN         NaN         NaN
0005    Apple       Coconut     NaN         NaN         NaN         NaN

1 个答案:

答案 0 :(得分:1)

melt之后添加DataFrame.dropna,以删除value列中丢失的行:

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2']).dropna(subset=['value'])

dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

print (dfn)
  Article_1 Article_2  Article_3  Article_4 Article_5
1    Banana     Apple  Pineapple    Coconut  Tropical
2     Apple    Banana    Coconut        NaN       NaN
3   Tomatoe     Apple    Coconut  Pineapple       NaN
4     Apple   Coconut        NaN        NaN       NaN
5     Apple   Coconut        NaN        NaN       NaN

如果需要所有列,请使用经过修改的justify函数:

dfn = pd.melt(df1, id_vars='ID', value_vars=['Article', 'Article_2'])

dfn = dfn.pivot_table(index='ID', 
                      columns=dfn.groupby('ID')['value'].cumcount().add(1),
                      values='value',
                      aggfunc='first').add_prefix('Article_').rename_axis(None, axis='index')

#https://stackoverflow.com/a/44559180/2901002
def justify(a, invalid_val=0, axis=1, side='left'):    
    """
    Justifies a 2D array

    Parameters
    ----------
    A : ndarray
        Input array to be justified
    axis : int
        Axis along which justification is to be made
    side : str
        Direction of justification. It could be 'left', 'right', 'up', 'down'
        It should be 'left' or 'right' for axis=1 and 'up' or 'down' for axis=0.

    """

    if invalid_val is np.nan:
        mask = pd.notna(a)
    else:
        mask = a!=invalid_val
    justified_mask = np.sort(mask,axis=axis)
    if (side=='up') | (side=='left'):
        justified_mask = np.flip(justified_mask,axis=axis)
    out = np.full(a.shape, invalid_val, dtype=object) 
    if axis==1:
        out[justified_mask] = a[mask]
    else:
        out.T[justified_mask.T] = a.T[mask.T]
    return out

dfn = pd.DataFrame(justify(dfn.values, invalid_val=np.nan, axis=1, side='left'),
                   index=dfn.index, columns=dfn.columns)
print (dfn)
  Article_1 Article_2  Article_3  Article_4 Article_5 Article_6
1    Banana     Apple  Pineapple    Coconut  Tropical       NaN
2     Apple    Banana    Coconut        NaN       NaN       NaN
3   Tomatoe     Apple    Coconut  Pineapple       NaN       NaN
4     Apple   Coconut        NaN        NaN       NaN       NaN
5     Apple   Coconut        NaN        NaN       NaN       NaN