Question

我可以在python中执行此操作，它为我提供了函数中可用的子模块/参数。

在翻译中，我可以这样做：

>>> from nltk import pos_tag
>>> dir(pos_tag)
['__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__doc__', '__format__', '__get__', '__getattribute__', '__globals__', '__hash__', '__init__', '__module__', '__name__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'func_closure', 'func_code', 'func_defaults', 'func_dict', 'func_doc', 'func_globals', 'func_name']

BTW，什么是dir(function)来电？

如何知道调用该函数需要哪些参数？，例如对于pos_tag，源代码说它需要token，请参阅https://github.com/nltk/nltk/blob/develop/nltk/tag/init.py

def pos_tag(tokens):
    """
    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.
        >>> from nltk.tag import pos_tag # doctest: +SKIP
        >>> from nltk.tokenize import word_tokenize # doctest: +SKIP
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad.")) # doctest: +SKIP
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]
    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))
    """
    tagger = load(_POS_TAGGER)
    return tagger.tag(tokens)

如果文档字符串可用于该函数，是否有办法知道该函数对特定参数的期望参数类型是什么？，例如在pos_tag案例之上，:param tokens: Sequence of tokens to be tagged和:type tokens: list(str)运行解释程序时可以获取这些信息而无需阅读代码吗？

最后，有没有办法知道什么是返回类型？

为了清楚起见，我不期待文档字符串的打印输出，但上面的问题是我可以稍后使用isinstance(output_object, type)进行某种类型检查

Answer 1

以下是您的四个问题的答案。我担心你想要做的一些事情在标准库中是不可能的，除非你想自己解析文档字符串。

（1）BTW，dir（函数）调用了什么？

如果我正确理解了这个问题，我相信文档会回答这个问题here：

如果对象具有名为__dir__()的方法，则将调用此方法   并且必须返回属性列表。这允许对象   实现自定义__getattr__()或__getattribute__()功能   自定义dir（）报告其属性的方式。

如果对象未提供__dir__()，则该函数会尝试最佳   从对象的__dict__属性收集信息，如果   定义，并从其类型对象。

（2）我如何知道调用该函数需要哪些参数？

最好的方法是使用inspect：

>>> from nltk import pos_tag
>>> from inspect import getargspec
>>> getargspec(pos_tag)
ArgSpec(args=['tokens'], varargs=None, keywords=None, defaults=None)  # a named tuple
>>> getargspec(pos_tag).args
['tokens']

（3）如果文档字符串可用于该函数，则可以使用知道函数期望的参数类型是什么具体参数？

不在标准库中，除非您想自己解析文档字符串。您可能已经知道可以访问这样的文档字符串：

>>> from inspect import getdoc
>>> print getdoc(pos_tag)
Use NLTK's currently recommended part of speech tagger to
tag the given list of tokens.

    >>> from nltk.tag import pos_tag
    >>> from nltk.tokenize import word_tokenize
    >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
    [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
    'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
    ('.', '.')]

:param tokens: Sequence of tokens to be tagged
:type tokens: list(str)
:return: The tagged tokens
:rtype: list(tuple(str, str))

或者这个：

>>> print pos_tag.func_code.co_consts[0]

    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.

        >>> from nltk.tag import pos_tag
        >>> from nltk.tokenize import word_tokenize
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]

    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))

如果你想尝试解析params和＆＃34;类型＆＃34;你可以自己开始使用正则表达式。显然，我正在使用＆＃34; type＆＃34;松散。此外，这种方法仅适用于以这种特定方式列出其参数和类型的文档字符串：

>>> import re
>>> params = re.findall(r'(?<=:)type\s+([\w]+):\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> for param, type_ in params:
    print param, '=>', type_

tokens => list(str)

这种方法的结果当然会给你参数及其相应的描述。您还可以通过拆分字符串并仅保留符合以下要求的单词来检查说明中的每个单词：

>>> isinstance(eval(word), type)
True
>>> isinstance(eval('list'), type)
True

但是这种方法很快就会变得复杂，特别是在尝试解析pos_tag的最后一个参数时。此外，文档字符串通常根本没有这种格式。所以这可能只适用于nltk，但即使这样也不是所有时间。

（4）最后，有没有办法知道什么是返回类型？

同样，我不敢，除非您想使用上面的正则表达式来梳理文档字符串。返回类型可能会根据arg（s）类型而有很大不同。（考虑任何可以与任何iterable一起使用的函数。）如果你想尝试从文档字符串中提取这些信息（再次，以pos_tag docstring的确切格式），你可以尝试另一个正则表达式：

>>> return_ = re.search(r'(?<=:)rtype:\s*(.*?)(?=\n|$)', getdoc(pos_tag))
>>> if return_:
    print 'return "type" =', return_.group()

return "type" = rtype: list(tuple(str, str))

否则，我们在这里做的最好的事情就是获取源代码（这也是明确你不想要的）：

>>> import inspect
>>> print inspect.getsource(pos_tag)
def pos_tag(tokens):
    """
    Use NLTK's currently recommended part of speech tagger to
    tag the given list of tokens.

        >>> from nltk.tag import pos_tag
        >>> from nltk.tokenize import word_tokenize
        >>> pos_tag(word_tokenize("John's big idea isn't all that bad."))
        [('John', 'NNP'), ("'s", 'POS'), ('big', 'JJ'), ('idea', 'NN'), ('is',
        'VBZ'), ("n't", 'RB'), ('all', 'DT'), ('that', 'DT'), ('bad', 'JJ'),
        ('.', '.')]

    :param tokens: Sequence of tokens to be tagged
    :type tokens: list(str)
    :return: The tagged tokens
    :rtype: list(tuple(str, str))
    """
    tagger = load(_POS_TAGGER)
    return tagger.tag(tokens)

探索python函数

1 个答案: