我有一个输入数据框,其内容如下:
NAME TEXT
Tim Tim Wagner is a teacher.
Tim He is from Cleveland, Ohio.
Frank Frank is a musician.
Tim He like to travel with his family
Frank He is a performing artist who plays the cello.
Frank He performed at the Carnegie Hall last year.
Frank It was fantastic listening to him.
如果NAME列的连续行具有相同的值,我想连接TEXT列。
输出数据框:
NAME TEXT
Tim Tim Wagner is a teacher. He is from Cleveland, Ohio.
Frank Frank is a musician
Tim He like to travel with his family
Frank He is a performing artist who plays the cello. He performed at the Carnegie Hall last year. It was fantastic listening to him.
使用大熊猫转移是最好的方法吗?感谢任何帮助
谢谢
答案 0 :(得分:1)
尝试:
(df['Name'] != df['NAME'].shift()).cumsum().rename('group')
df.groupby(['NAME', grp], sort=False)['TEXT']\
.agg(' '.join).reset_index().drop('group', axis=1)
输出:
NAME TEXT
0 Tim Tim Wagner is a teacher. He is from Cleveland,...
1 Frank Frank is a musician
2 Tim He likes to travel with his family
3 Frank He is a performing artist who plays the cello....
答案 1 :(得分:0)
我一行一行地创建了一个新的DataFrame。
import pandas as pd
df = pd.DataFrame([['Tim', 'Tim Wagner is a teacher.'],
['Tim', 'He is from Cleveland, Ohio.'],
['Frank', 'Frank is a musician'],
['Tim ', 'He likes to travel with his family'],
['Frank', 'He is a performing artist who plays the cello.'],
['Frank', 'He performed at the Carnegie Hall last year'],
['Frank', 'It was fantastic listening to him']], columns=['NAME', 'TEXT'])
col = ""
txt = ""
arr = []
con_ind = 0
for i, row in df.iterrows():
if col == row['NAME']:
txt += ' ' + row['TEXT']
else :
if (i != 0):
arr.append([col, txt])
col = row['NAME']
txt = row['TEXT']
if (txt != row['TEXT']):
arr.append([col, txt])
print(pd.DataFrame(arr))