我正在阅读一种用于CYK解析的算法,但我不理解这种数据结构,即P [M,i,j] = new Tree(M,i,j,null,null,null,0.0);如何在Java中实现这种数组? 算法如下,
class Tree {
NonTerm phrase % The Non-terminal
int startPhrase, int endPhrase; % indices of starting and ending word
String word; % If a leaf, then the word
Tree left;
Tree right;
double prob;
}
function CYK-PARSE(sentence,grammar) return P, a chart. {
1. N = length(sentence);
2. for (i = 1 to N) {
3. word = sentence[i];
4. for (each rule "POS --> Word [prob]" in the grammar)
5. P[POS,i,i] = new Tree(POS,i,i,word,null,null,prob);
6. } % endfor line 2.
7. for (length = 2 to N) % length = length of phrase
8. for (i = 1 to N+1-length) { % i == start of phrase
9. j = i+length-1; % j == end of phrase
10. for (each NonTerm M) {
11. P[M,i,j] = new Tree(M,i,j,null,null,null,0.0);
12. for (k = i to j-1) % k = end of first subphrase
13. for (each rule "M -> Y,Z [prob]" in the grammar) {
14. newProb = P[Y,i,k].prob * P[Z,k+1,j].prob * prob;
15. if (newProb > P[M,i,j].prob) {
16. P[M,i,j].left = P[Y,i,k];
17. P[M,i,j].right = P[Z,k+1,j];
18. P[M,i,j].prob = newProb;
19. } % endif line 15
20. } % endfor line 13
21. } % endfor line 10
22. } % endfor line 8
23. return P;
24. } % end CYK-PARSE.
它说:“该过程中的主要数据结构是一个图表,它是数组P [M,I,J]。M在非终端上索引,I和J从1到N变长(长度P [M,I,J]是具有NonTerm == M,startPhrase == I和endPhrase == J的节点。 我不知道图表是什么。如果要用Java实现它,那么对于包含Tree对象的P [M,i,j],我将使用什么数据结构。