Question

我正在检查使用pandas的项目的类型。熊猫不包含类型注释，并且在排版中没有存根文件。

正如我所期望的那样，mypy对于这个简单的示例提出了一个错误：

class A:
    def method(self) -> int:
        return 1


class B(A):
    def method(self) -> float:
        return 1.1

$ mypy mypy_example.py 
mypy_example.py:11: error: Return type "float" of "method" incompatible with return type "int" in supertype "A"

考虑以下示例：

class C:
    def method(self) -> pd.Series:
        return pd.Series([1, 2, 3])


class D(C):
    def method(self) -> pd.DataFrame:
        return pd.DataFrame({"a": [1, 2, 3]})

如预期的那样，mypy说找不到大熊猫的存根文件，因此找不到错误。

$ mypy mypy_example.py 
mypy_example.py:1: error: No library stub file for module 'pandas'
mypy_example.py:1: note: (Stub files are from https://github.com/python/typeshed)
mypy_example.py:11: error: Return type "float" of "method" incompatible with return type "int" in supertype "A"

我可以设置ignore_missing_imports，但这意味着我错过了要捕获的错误。

我已经在存根文件中尝试了一些尝试，但均未成功：

from typing import Any, NewType

# dynamic typing, but doesn't discriminate between Series and DataFrame
Series = Any
DataFrame = Any

# discriminates but doesn't dynamically type
Series = NewType('Series', object)
DataFrame = NewType('DataFrame', object)

是否可以编写一个简短的存根文件或类型注释，以使我能够利用动态键入功能，但又认识到pd.Series和pd.DataFrame是不同的类型？

Answer 1

而不是试图让mypy区分两个动态类，我实际上是采取使它们不动态（或者，仅 partially 动态）的途径。通过将它们定义为存根内部的成熟类。

通过定义两个如下所示的类，您可以开始使用非常初步且最少的存根集：

from typing import Any

class Series:
    def __init__(self, *args: Any, **kwargs: Any) -> None: ...
    def __getattr__(self, name: str) -> Any: ...

class DataFrame(Series):
    def __init__(self, *args: Any, **kwargs: Any) -> None: ...
    def __getattr__(self, name: str) -> Any: ...

借助__getattr__函数，mypy可以了解您的班级不完整且未完全注释。这意味着即使从未将函数显式添加到您的类中，诸如DataFrame().query(...)之类的操作也将继续进行类型检查。

当然，如果您决定添加一些方法签名，则mypy将开始对这些调用进行类型检查，而不是动态地进行键入。这意味着您也可以根据需要逐步添加更精确的方法签名，并最终完全摆脱__getattr__。

如果您决定采用此路线，则可能会发现现有的numpy stubs是很好的灵感来源。而且，如果您想要真正精确的类型，则discussion here可能是相关的。

如果您好奇的话，writing incomplete stubs上的排版指南提供了有关编写部分存根的更多信息。

如何区分两种动态类型？

1 个答案: