根据另一个DF排列水平DF条目

时间:2020-07-03 17:22:10

标签: python pandas dataframe sorting unique

我的DataFrame 1看起来像这样:

ID   group_1     area_1     group_2     area_2     group_3    area_3
        
1    basketball  250        scoccer     500        swimming   100
2    volleyball  100        np.nan      np.nan     np.nan     np.nan
3    football    10         basketball  1000       np.nan     np.nan

我还有另一个看起来像这样的DF2

ID   group_1     area_1    group_2     area_2  group_3    area_3  group_4   area_4
        
1    scoccer     500       basketball  50      basketball 200     swimming  100
2    volleyball  np.nan    np.nan      np.nan  np.nan     np.nan  np.nan    np.nan
3    basketball  1000      basketball  np.nan  football   10      np.nan    np.nan

我想要的输出应如下所示:

ID   group_1     area_1     group_2     area_2     group_3    area_3
        
1    scoccer     500        basketball  250        swimming   100
2    volleyball  100        np.nan      np.nan     np.nan     np.nan
3    basketball  1000       football    10         np.nan     np.nan

我想用DF2中的结构来布置DF1,这意味着第一步,我需要确定DF2中独特的水平表情(滑板车,篮球,游泳),其中重要的布置。然后按这种安排对DF1进行排序(但要保留来自area_x的正确值)。

编辑: 有了@kait的答案,final_df看起来像这样:

ID group_1    area_1  group_2  group_3    area_3  group_4 group_5   area_5  group_6  
        
1  scoccer    500     500      basketball 250     250     swimming  100     100
2  volleyball 100     100      np.nan     np.nan  np.nan  np.nan    np.nan  np.nan
3  basketball 1000    1000     football   10      10      np.nan    np.nan  np.nan

1 个答案:

答案 0 :(得分:0)

这行吗?

首先,重塑df1

new_rows = []
for k, v in df.iterrows():
    for group in range(1,4):
        new_rows.append([v['ID'], v[f'group_{group}'], v[f'area_{group}']])

new_df = pd.DataFrame(new_rows, columns=['ID', 'group', 'area']).dropna()

display(new_df)
   ID       group  area
0   1  basketball   250
1   1     scoccer   500
2   1    swimming   100
3   2  volleyball   100
6   3    football    10
7   3  basketball  1000

接下来,解析df2:

parsed_rows = []
def parse_df2(row):
    x = {}
    x['ID'] = row['ID']
    groups = [v for k, v in row.items() if 'group' in k or k == 'ID']
    deduped = [groups[i]
               for i
               in range(len(groups))
               if (i == 0)
               or groups[i] != groups[i - 1]]
    print(deduped)
    for k, v in enumerate(deduped):
        if k == 0 or pd.isna(v):
            continue
        x[f'group_{k}'] = v
        mask = new_df.ID == row['ID']
        mask &= new_df.group == v
        if new_df[mask].empty:
            continue
        x[f'area_{k}'] = new_df[mask]['area'].iloc[0]

    parsed_rows.append(x)
df2.apply(lambda x: parse_df2(x), axis=1)
final_df = pd.DataFrame(parsed_rows)

display(final_df)
 ID     group_1  area_1     group_2  area_2   group_3  area_3
  1     scoccer     500  basketball   250.0  swimming   100.0
  2  volleyball     100         NaN     NaN       NaN     NaN
  3  basketball    1000    football    10.0       NaN     NaN