Question

给出以下python脚本：

# dedupe.py
import re

def dedupe_whitespace(s,spacechars='\t '):
    """Merge repeated whitespace characters.
    Example:
    >>> dedupe_whitespace(r"Green\t\tGround")  # doctest: +REPORT_NDIFF
    'Green\tGround'
    """
    for w in spacechars:
        s = re.sub(r"("+w+"+)", w, s)
    return s

该函数在python解释器中按预期工作：

$ python
>>> import dedupe
>>> dedupe.dedupe_whitespace('Purple\t\tHaze')
'Purple\tHaze'
>>> print dedupe.dedupe_whitespace('Blue\t\tSky')
Blue    Sky

但是，doctest示例失败，因为在与结果字符串进行比较之前，制表符被转换为空格：

>>> import doctest, dedupe
>>> doctest.testmod(dedupe)

给出

Failed example:
    dedupe_whitespace(r"Green           Ground")  #doctest: +REPORT_NDIFF
Differences (ndiff with -expected +actual):
    - 'Green  Ground'
    ?       -
    + 'Green Ground'

如何对doctest heredoc字符串中的制表符进行编码，以便正确执行测试结果比较？

Answer 1

我已经使用docstring的文字字符串表示法来实现这一点：

def join_with_tab(iterable):
    r"""
    >>> join_with_tab(['1', '2'])
    '1\t2'
    """

    return '\t'.join(iterable)

if __name__ == "__main__":
    import doctest
    doctest.testmod()

Answer 2

这是原始的heredoc字符串表示法（r"""）完成了这个技巧：

# filename: dedupe.py
import re,doctest
def dedupe_whitespace(s,spacechars='\t '):
    r"""Merge repeated whitespace characters.
    Example:
    >>> dedupe_whitespace('Black\t\tGround')  #doctest: +REPORT_NDIFF
    'Black\tGround'
    """
    for w in spacechars:
        s = re.sub(r"("+w+"+)", w, s)
    return s

if __name__ == "__main__":
    doctest.testmod()

Answer 3

这基本上是YatharhROCK的答案，但更明确一点。您可以使用原始字符串或双重转义。但为什么呢？

您需要字符串文字来包含有效的Python代码，在解释时，代码是您要运行/测试的代码。这些都有效：

#!/usr/bin/env python

def split_raw(val, sep='\n'):
  r"""Split a string on newlines (by default).

  >>> split_raw('alpha\nbeta\ngamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)


def split_esc(val, sep='\n'):
  """Split a string on newlines (by default).

  >>> split_esc('alpha\\nbeta\\ngamma')
  ['alpha', 'beta', 'gamma']
  """
  return val.split(sep)

import doctest
doctest.testmod()

使用原始字符串的效果和双重转义的效果（转义斜杠）都会在字符串中留下两个字符，即斜杠和n。此代码传递给Python解释器，它在字符串文字中使用“斜杠然后n”表示“换行符”。

使用您喜欢的任何一种。

Answer 4

您必须设置NORMALIZE_WHITESPACE。 ~~或者，或者，捕获输出并将其与预期值进行比较：~~

def dedupe_whitespace(s,spacechars='\t '): """Merge repeated whitespace characters. Example: >>> output = dedupe_whitespace(r"Black\t\tGround") #doctest: +REPORT_NDIFF >>> output == 'Black\tGround' True """

<击>

来自doctest文档部分How are Docstring Examples Recognized?：

使用8列标签将所有硬标签字符展开为空格停止。不修改由测试代码生成的输出中的选项卡。因为示例输出中的任何硬标签都扩展，这意味着如果代码输出包含硬标签，那么doctest的唯一方法可以通过是如果 NORMALIZE_WHITESPACE 选项或指令生效。或者，测试可以重写以捕获输出并将其与预期值进行比较部分测试。到达源中对标签的处理通过反复试验，并证明是最不容易出错的处理它们的方式。可以使用不同的算法通过编写自定义DocTestParser类来处理标签。

编辑：我的错误，我从另一个方面理解了文档。在传递给dedupe_whitespace的字符串参数和在下一行上进行比较的字符串文字时，选项卡将扩展为8个空格，因此output包含：

"Black Ground"

正在与之比较：

"Black Ground"

如果不编写自己的DocTestParser或测试重复数据删除空格而不是制表符，我找不到克服此限制的方法。

Answer 5

TL; DR：转义反斜杠，即在您未经修改的字符串中使用\\n或\\t代替\n或\t ;

你可能不想让你的docstrings原始，因为你将无法使用任何Python字符串转义，包括你可能想要的那些。

对于支持使用普通转义的方法，只需转义反斜杠字符转义中的反斜杠，这样在Python解释后，它会留下一个字面反斜杠，后跟doctest可以解析的字符。

Answer 6

通过在期望的字符串中转义制表符来使它起作用：

>>> function_that_returns_tabbed_text()
'\\t\\t\\tsometext\\t\\t'

代替

>>> function_that_returns_tabbed_text()
\t\t\tsometext\t\t

如何在python doctest结果字符串中包含特殊字符（制表符，换行符）？

6 个答案: