当DataFrame具有不同的值时,如何将重复行合并为一个

时间:2016-01-05 14:17:22

标签: python pandas merge dataframe

我有DataFrame,如下所示:

ID  NAME    TEL_1   TEL_2   TEL_3
1   John    123456  754987  465317
1   John    465987          465987
1   John            546783
2   Robert  264687  
2   Robert          462531  
3   William 432645  765346  875137

我需要合并具有相同ID的行,保存手机值,如下所示:

ID  NAME    TEL_1   TEL_2   TEL_3   TEL_4   TEL_5   TEL_6
1   John    123456  754987  465317  465987  465987  546783
2   Robert  264687  462531  
3   William 432645  765346  875137  

2 个答案:

答案 0 :(得分:1)

您可以将IDNAME columns设置为index,对这些使用groupby,然后concat各自rows水平地获得你想要的输出:

persons = df.set_index(['ID', 'NAME']).groupby(level=['ID', 'NAME'])
new_df =pd.DataFrame()
for details, phones in persons:
    person_phones = pd.concat([row for i, row in phones.iterrows()]).to_frame()
    person_phones.index = ['TEL_{}'.format(i) for i in range(len(person_phones))]
    new_df = pd.concat([new_df, person_phones], axis=1)

new_df.transpose().reset_index().rename(columns={'level_0': 'ID', 'level_1': 'NAME'})

得到:

   ID     NAME   TEL_0   TEL_1   TEL_2   TEL_3   TEL_4   TEL_5  TEL_6   TEL_7  \
0   1     John  123456  754987  465317  465987     NaN  465987    NaN  546783   
1   2   Robert  264687     NaN     NaN     NaN  462531     NaN    NaN     NaN   
2   3  William  432645  765346  875137     NaN     NaN     NaN    NaN     NaN   

   TEL_8  
0    NaN  
1    NaN  
2    NaN 

答案 1 :(得分:0)

您可以尝试:

import pandas as pd
import numpy as np

data = {'ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 3},
        'NAME': {0: 'John', 1: 'John', 2: 'John', 3: 'Robert',
                 4: 'Robert', 5: 'William'},
       'TEL_1': {0: 123456, 1: 465987, 2: None, 3: 264687, 4: None,
                 5: 432645},
       'TEL_2': {0: 754987, 1: None, 2: 546783, 3: None, 4: 462531,
                 5: 765346},
       'TEL_3': {0: 465317, 1: 465987, 2: None, 3: None, 4: None,
                 5: 875137}}

df = pd.DataFrame(data)
grouped = df.groupby(['ID', 'NAME'])

def merger(group):
    nr_cols = [col for col in group.columns if 'TEL_' in col]
    values = [group[col].values for col in nr_cols]
    new_row = pd.Series()
    i = 1
    for row in values:
        for nr in row:
            if not np.isnan(nr):
                new_row['TEL_{}'.format(i)] = nr
                i += 1
    return new_row

merged = grouped.apply(merger).unstack().reset_index()

merged数据框将如下所示:

ID NAME     TEL_1   TEL_2   TEL_3   TEL_4   TEL_5   TEL_6
1  John     123456  465987  754987  546783  465317  465987
2  Robert   264687  462531     NaN     NaN     NaN     NaN
3  William  432645  765346  875137     NaN     NaN     NaN