我想用张量流为我的聊天机器人制作一个火车源。 我的语料库文件如下所示:
hello!
hello,nice to meet you!
nice to meet you too!
goodbye
bye
我读完语料库文件后:
ds = tf.data.TextLineDataset("./corpus.txt")
ds = ds.map(lambda x: tf.py_func(lambda x: x.lower(), [x], tf.string, stateful=False))
ds = ds.map(lambda x: tf.constant("bos_ ") + x + tf.constant(" _eos"))
我可以获得这样的数据集:
bos_ hello! _eos
bos_ hello ,nice to meet you! _eos
bos_ nice to meet you too! _eos
bos_ goodbye _eos
bos_ bye _eos
但是我如何制作这样的数据集:
('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello, nice to meet you! _eos', 'bos_ nice to meet you too! _eos')
('bos_ nice to meet you too! _eos', 'bos_ goodbye _eos')
('bos_ goodbye _eos','bos_ bye _eos')
此外,我如何制作如下数据集:
('bos_ hello! _eos', 'bos_ hello, nice to meet you! _eos')
('bos_ hello! hello, nice to meet you! _eos', 'bos_ nice to meet you too! _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! _eos', 'bos_ goodbye _eos')
('bos_ hello! hello,nice to meet you! nice to meet you too! goodbye _eos', 'bos_ bye _ eos')