如何获得每个功能的MAXEnt分类器NLTK权重

时间:2019-05-09 02:15:14

标签: python nltk maxent

maxent分类器具有8个功能,并且想知道每个权重,因为我需要有关每个功能有多重要的信息。

for i in range(len(list)):
        features = {}
        features['a'] = 0
        features['b'] = 0
        features['c'] = 0
        features['d'] = 0
        features['e'] = 0
        features['f'] = 0
        features['g'] = 0
        features['h'] = 0

        for j in range(len(list[i])):
            first, second = list[i][j].split('+')
            first_lexical, first_morph = first.split('/')
            second_lexical, second_morph = second.split('/')

            if first_lexical == second_lexical:
                features['a'] += 1
            if first_morph == second_morph:
                features['b'] += 1

                if "JC" in first_morph:
                    features['d'] += 1
                elif first_lexical == second_lexical:
                    if "EF" in first_morph or "EC" in first_morph or "ET" in first_morph:
                        features['d'] += 1
                    elif "EP" in first_morph:
                        features['e'] += 1
                    elif "XS" in first_morph:
                        features['f'] += 1
                    elif "JX" in first_morph:
                        features['g'] += 1
                    elif "JC" in first_morph:
                        features['h'] += 1

我使用最大熵是因为要计算两个句子之间的结构相似度。 所以我用特征作为相同语素的计数。这就是为什么要素值不为0或1的原因。

当我运行此代码时:

print(classifier.weights())

它打印列表的64个元素。 我认为它只显示8个元素(重量),但返回的结果如下:

[ 1.74089048  2.66009496  1.42702806  0.14474766  0.14210167  0.15642977
  0.07329622  0.19233666  0.30679333  1.05599702  1.60007152 -0.17416653
  0.09417338  0.16386887  0.27088739 -0.72500181 -8.48476894  0.2924295
  0.29734346  0.28692798  1.24685007  1.13583538  0.34032173  0.97472507
  1.21521307  1.31532032  1.57745202  0.5204001   0.76549421  1.79209505
  0.44465357  0.73647553 -1.08840863  7.89243891  1.08035386 10.01641604
  1.12682947  0.37774782  0.85929749  0.16311825  0.45568935 -0.04190585
 -0.06698004 -0.08507122 -0.02308924 -0.10700906  0.10775206  0.66603408
 -0.39178407  0.13196092  0.09278365  0.36485199  0.64181725 -3.63790857
  2.32751187 -0.87754617  0.63697054 -3.16749379 -8.87589551  0.1192744
 -2.68618694 -3.6713022  -3.79744038 -1.1949963 ]

我想知道每个元素的含义以及如何获得每个元素的权重。

0 个答案:

没有答案