我正在尝试在同一嵌入空间中映射单字组,双字组和三字组向量,以查看短语和单个单词之间的相似性。 为了获得这样的结果,我通过以下方式创建训练数据:
例如:Text = "Can I solve this problem?"
我有这句话的单字,二字和三字。
unigram_list = ["Can", "I", "solve", "this", "problem"]
bigram_list = [("Can", "I"), ("I", "solve"), ("solve", "this"), ("this", "problem")]
是否可以使用unigram,双字母组的所有可能组合来构造句子?
赞:
sentence_combo_1 = ["Can", ("I", "solve"), "this", "problem"]
sentence_combo_2 = ["Can", "I", ("solve", "this"), "problem"]
sentence_combo_3 = [("Can", "I"), ("solve", "this"), "problem"]
以此类推