我想基于带有对话的数据集构建seq2seq训练集。
我与会话的当前数据集(名为:“ only_transcripts_test ”)的格式为:
msgText 。 msgSource
“问题1”访问者
“ Answer1”代理
“ Answer1.2”代理
“ Question2”访客
“ Question2.1”访客
“ Answer2”代理
msgSource
中的 visitor 和 agent 的外观没有特殊的样式对于我的seq2seq模型,我想创建以下形式的数据集:
访问者 代理
“问题1”“答案1”
“ Question2.1”“ Answer2”
因此,座席直接回答的每个访客问题都包含在新数据集中,以及座席给出的第一个答案。
为此,我尝试了三种不同的方法:
seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))
#Attempt 1
newLine = 1
for seq in range(len(only_transcripts_test)-1):
if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
seq2seq_test = seq2seq_test.append({'Visitor': only_transcripts_test.msgSource[seq+1], 'Agent': only_transcripts_test.msgSource[seq+2]}, ignore_index = True)
#Attempt 2
newLine = 0
seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))
for seq in range(len(only_transcripts_test)-1):
newLine+= 1
if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
seq2seq_test.Visitor[newLine] = only_transcripts_test.msgSource[seq+1]
seq2seq_test.Agent[newLine] = only_transcripts_test.msgSource[seq+2]
#Attempt 3
newLine = 1
seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))
for seq in range(len(only_transcripts_test)-1):
if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
seq2seq_test.loc['Visitor'][newLine] = only_transcripts_test.msgSource[seq+1]
seq2seq_test.loc['Agent'][newLine] = only_transcripts_test.msgSource[seq+2]
newLine+=1
如何更改代码,以便获得所需的输出格式?