如何将变量附加到数据框以构建seq2seq训练集?

时间:2019-04-05 15:42:21

标签: python append seq2seq

我想基于带有对话的数据集构建seq2seq训练集。

我与会话的当前数据集(名为:“ only_transcripts_test ”)的格式为:


msgText msgSource

“问题1”访问者

“ Answer1”代理

“ Answer1.2”代理

“ Question2”访客

“ Question2.1”访客

“ Answer2”代理


msgSource

中的 visitor agent 的外观没有特殊的样式

对于我的seq2seq模型,我想创建以下形式的数据集:


访问者 代理

“问题1”“答案1”

“ Question2.1”“ Answer2”


因此,座席直接回答的每个访客问题都包含在新数据集中,以及座席给出的第一个答案。

为此,我尝试了三种不同的方法:

seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))

#Attempt 1
newLine = 1

for seq in range(len(only_transcripts_test)-1):
    if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
        seq2seq_test = seq2seq_test.append({'Visitor': only_transcripts_test.msgSource[seq+1], 'Agent': only_transcripts_test.msgSource[seq+2]}, ignore_index = True)

#Attempt 2
newLine = 0
seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))

for seq in range(len(only_transcripts_test)-1):
    newLine+= 1

    if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
        seq2seq_test.Visitor[newLine] = only_transcripts_test.msgSource[seq+1]
        seq2seq_test.Agent[newLine] = only_transcripts_test.msgSource[seq+2]

#Attempt 3
newLine = 1
seq2seq_test = pd.DataFrame(columns = ('Visitor', 'Agent'))

for seq in range(len(only_transcripts_test)-1):
    if only_transcripts_test.msgSource[seq+1] == 'visitor' and only_transcripts_test.msgSource[seq+2] == 'agent':
        seq2seq_test.loc['Visitor'][newLine] = only_transcripts_test.msgSource[seq+1]
        seq2seq_test.loc['Agent'][newLine] = only_transcripts_test.msgSource[seq+2]

        newLine+=1

如何更改代码,以便获得所需的输出格式?

0 个答案:

没有答案