我的Mysql表类似:
|DocumentID| Documents
============================================
0 Penny bought bright blue fishes.
1 Penny bought bright blue and orange fish.
2 The cat ate a fish at the store.
3 Penny went to the store. Penny ate a bug. Penn...
4 It meowed once at the bug, it is still meowing...
5 The cat is at the fish store. The cat is orang...
6 Penny is a fish
现在,我要创建一个新表,其中列是所有文档和行中的唯一词,作为与每个DocumentID
对应的值,等于
(number of times word appears in sentence) / (number of words in sentence)
。
类似:-
DocumentID ate blue bought bright bug cat fish meow once orang penni saw store went
0 0.000000 0.200000 0.200000 0.200000 0.000000 0.000 0.200000 0.000000 0.000000 0.000000 0.200000 0.000000 0.000000 0.000000
1 0.000000 0.166667 0.166667 0.166667 0.000000 0.000 0.166667 0.000000 0.000000 0.166667 0.166667 0.000000 0.000000 0.000000
2 0.250000 0.000000 0.000000 0.000000 0.000000 0.250 0.250000 0.000000 0.000000 0.000000 0.000000 0.000000 0.250000 0.000000
3 0.111111 0.000000 0.000000 0.000000 0.111111 0.000 0.111111 0.000000 0.000000 0.000000 0.333333 0.111111 0.111111 0.111111
4 0.000000 0.000000 0.000000 0.000000 0.333333 0.000 0.166667 0.333333 0.166667 0.000000 0.000000 0.000000 0.000000 0.000000
5 0.000000 0.000000 0.000000 0.000000 0.000000 0.375 0.250000 0.125000 0.000000 0.125000 0.000000 0.000000 0.125000 0.000000
6 0.000000 0.000000 0.000000 0.000000 0.000000 0.000 0.500000 0.000000 0.000000 0.000000 0.500000 0.000000 0.000000 0.000000
我尝试了很多,但没有得到预期的结果。