我想在train_dict中找到给定test_dict的类似主题。我有两个字典-train_dict和test_dict。我不确定如何为test_dict中的每个文档找到与train_dict类似或接近的主题。我发现KL散度是一种用于此目的的技术。但是我不确定如何在这种情况下使用它。
train_dict = {490514.0: {0: 0.039169986,
1: 0.023344912,
2: 0.028936442,
3: 0.022125904,
4: 0.040051,
5: 0.030525777,
6: 0.06751838,
7: 0.59827864,
8: 0.023744604,
9: 0.04026981,
10: 0.044118173,
11: 0.041916344},
489733.0: {0: 0.012707975,
1: 0.5981753,
4: 0.012993803,
6: 0.021207014,
7: 0.010705788,
9: 0.07442666,
10: 0.22201125,
11: 0.01359898},
497410.0: {0: 0.012707975,
1: 0.5981752,
4: 0.012993803,
6: 0.021207014,
7: 0.010705788,
9: 0.07442666,
10: 0.22201134,
11: 0.01359898}}
test_dict = {85.0: {0: 0.28180935978889465,
1: 0.02879604697227478,
2: 0.0356932207942009,
3: 0.027292393147945404,
4: 0.2815341353416443,
5: 0.03765367344021797,
6: 0.08200311660766602,
7: 0.04070392623543739,
8: 0.029300140216946602,
9: 0.04947005212306976,
10: 0.05403999984264374,
11: 0.051703985780477524},
86.0: {0: 0.28180935978889465,
1: 0.028796043246984482,
2: 0.0356932170689106,
3: 0.027292391285300255,
4: 0.2815358638763428,
5: 0.03765366971492767,
6: 0.08200132846832275,
7: 0.040703922510147095,
8: 0.02930011972784996,
9: 0.049470048397779465,
10: 0.05403999239206314,
11: 0.05170397832989693}}
找到列车指令和测试指令之间的Kuller散度。我想从火车dict值中找到最接近的2个测试dict的点。我不确定如何计算。