有条件地合并熊猫数据框的连续行

时间:2020-09-11 20:05:38

标签: python pandas dataframe concatenation

我有一个输入数据框,其内容如下:

NAME    TEXT
Tim     Tim Wagner is a teacher.
Tim     He is from Cleveland, Ohio.
Frank   Frank is a musician.
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello.
Frank   He performed at the Carnegie Hall last year.
Frank   It was fantastic listening to him.

如果NAME列的连续行具有相同的值,我想连接TEXT列。

输出数据框:

NAME    TEXT
Tim     Tim Wagner is a teacher.  He is from Cleveland, Ohio.
Frank   Frank is a musician
Tim     He like to travel with his family
Frank   He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.

使用大熊猫转移是最好的方法吗?感谢任何帮助

谢谢

2 个答案:

答案 0 :(得分:1)

尝试:

(df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
  .agg(' '.join).reset_index().drop('group', axis=1)

输出:

    NAME                                               TEXT
0    Tim  Tim Wagner is a teacher. He is from Cleveland,...
1  Frank                                Frank is a musician
2   Tim                  He likes to travel with his family
3  Frank  He is a performing artist who plays the cello....

答案 1 :(得分:0)

我一行一行地创建了一个新的DataFrame。


import pandas as pd

df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])

col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
    if col == row['NAME']:
        txt += ' ' + row['TEXT']
    else :
        if (i != 0):
            arr.append([col, txt])
        col = row['NAME']
        txt = row['TEXT']
        
if (txt != row['TEXT']):
    arr.append([col, txt])


print(pd.DataFrame(arr))