获取扩展的空间形态信息

时间:2018-12-19 11:05:05

标签: python python-3.x nlp spacy

我希望使用spaCy进行研究,形态学信息对我很重要。

阅读the documentation on rule-based morphology时,我不知道如何将标签(例如NNP,VBZ)转换为形态矢量(例如,VerbForm = Fin,Mood = Ind,Tense = Pres)。可能有内置的tag map吗?这样的东西(内置)会很有用,但我似乎找不到它:

{
    "NNS":  {POS: NOUN, "Number": "plur"},
    "VBG":  {POS: VERB, "VerbForm": "part", "Tense": "pres", "Aspect": "prog"},
    "DT":   {POS: DET}
    ...
} 

我找到了the PoS Tagging table,但是我无法确定此映射是否在代码中可用,甚至在解析的令牌中直接可用?


我在GitHub上找到了the tagmap for English,但不确定如何导入。有帮助吗?

1 个答案:

答案 0 :(得分:1)

进一步研究language class时,我发现您可以使用来获取默认标签图

> nlp = spacy.load('en')
> print(nlp.Defaults.tag_map)
{'.': {74: 96, 'PunctType': 'peri'}, ',': {74: 96, 'PunctType': 'comm'}, '-LRB-': {74: 96, 'PunctType': 'brck', 'PunctSide': 'ini'}, '-RRB-': {74: 96, 'PunctType': 'brck', 'PunctSide': 'fin'}, '``': {74: 96, 'PunctType': 'quot', 'PunctSide': 'ini'}, '""': {74: 96, 'PunctType': 'quot', 'PunctSide': 'fin'}, "''": {74: 96, 'PunctType': 'quot', 'PunctSide': 'fin'}, ':': {74: 96}, '$': {74: 98, 'Other': {'SymType': 'currency'}}, '#': {74: 98, 'Other': {'SymType': 'numbersign'}}, 'AFX': {74: 83, 'Hyph': 'yes'}, 'CC': {74: 88, 'ConjType': 'coor'}, 'CD': {74: 92, 'NumType': 'card'}, 'DT': {74: 89}, 'EX': {74: 85, 'AdvType': 'ex'}, 'FW': {74: 100, 'Foreign': 'yes'}, 'HYPH': {74: 96, 'PunctType': 'dash'}, 'IN': {74: 84}, 'JJ': {74: 83, 'Degree': 'pos'}, 'JJR': {74: 83, 'Degree': 'comp'}, 'JJS': {74: 83, 'Degree': 'sup'}, 'LS': {74: 96, 'NumType': 'ord'}, 'MD': {74: 99, 'VerbType': 'mod'}, 'NIL': {74: ''}, 'NN': {74: 91, 'Number': 'sing'}, 'NNP': {74: 95, 'NounType': 'prop', 'Number': 'sing'}, 'NNPS': {74: 95, 'NounType': 'prop', 'Number': 'plur'}, 'NNS': {74: 91, 'Number': 'plur'}, 'PDT': {74: 83, 'AdjType': 'pdt', 'PronType': 'prn'}, 'POS': {74: 93, 'Poss': 'yes'}, 'PRP': {74: 94, 'PronType': 'prs'}, 'PRP$': {74: 83, 'PronType': 'prs', 'Poss': 'yes'}, 'RB': {74: 85, 'Degree': 'pos'}, 'RBR': {74: 85, 'Degree': 'comp'}, 'RBS': {74: 85, 'Degree': 'sup'}, 'RP': {74: 93}, 'SP': {74: 102}, 'SYM': {74: 98}, 'TO': {74: 93, 'PartType': 'inf', 'VerbForm': 'inf'}, 'UH': {74: 90}, 'VB': {74: 99, 'VerbForm': 'inf'}, 'VBD': {74: 99, 'VerbForm': 'fin', 'Tense': 'past'}, 'VBG': {74: 99, 'VerbForm': 'part', 'Tense': 'pres', 'Aspect': 'prog'}, 'VBN': {74: 99, 'VerbForm': 'part', 'Tense': 'past', 'Aspect': 'perf'}, 'VBP': {74: 99, 'VerbForm': 'fin', 'Tense': 'pres'}, 'VBZ': {74: 99, 'VerbForm': 'fin', 'Tense': 'pres', 'Number': 'sing', 'Person': 3}, 'WDT': {74: 83, 'PronType': 'int|rel'}, 'WP': {74: 91, 'PronType': 'int|rel'}, 'WP$': {74: 83, 'Poss': 'yes', 'PronType': 'int|rel'}, 'WRB': {74: 85, 'PronType': 'int|rel'}, 'ADD': {74: 100}, 'NFP': {74: 96}, 'GW': {74: 100}, 'XX': {74: 100}, 'BES': {74: 99}, 'HVS': {74: 99}, '_SP': {74: 102}}