Question

当我在Python中使用三引号多行字符串时，我倾向于使用textwrap.dedent来保持代码的可读性，并且有很好的缩进：

some_string = textwrap.dedent("""
    First line
    Second line
    ...
    """).strip()

但是，在Python 3.x中，textwrap.dedent似乎不能使用字节字符串。我在为返回长多行字节字符串的方法编写单元测试时遇到了这种情况，例如：

# The function to be tested

def some_function():
    return b'Lorem ipsum dolor sit amet\n  consectetuer adipiscing elit'

# Unit test

import unittest
import textwrap

class SomeTest(unittest.TestCase):
    def test_some_function(self):
        self.assertEqual(some_function(), textwrap.dedent(b"""
            Lorem ipsum dolor sit amet
              consectetuer adipiscing elit
            """).strip())

if __name__ == '__main__':
    unittest.main()

在Python 2.7.10中，上面的代码工作正常，但在Python 3.4.3中它失败了：

E
======================================================================
ERROR: test_some_function (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "test.py", line 16, in test_some_function
    """).strip())
  File "/usr/lib64/python3.4/textwrap.py", line 416, in dedent
    text = _whitespace_only_re.sub('', text)
TypeError: can't use a string pattern on a bytes-like object

----------------------------------------------------------------------
Ran 1 test in 0.001s

FAILED (errors=1)

所以：是否有一个替代textwrap.dedent可以使用字节字符串？

我自己可以编写这样的功能，但如果有现有的功能，我更愿意使用它。
我可以转换为unicode，使用textwrap.dedent，然后转换回字节。但是，只有当字节串符合某种Unicode编码时，这才有可能。

Answer 1

似乎dedent不支持字节串，遗憾的是。但是，如果您想要交叉兼容的代码，我建议您利用six库：

import sys, unittest
from textwrap import dedent

import six


def some_function():
    return b'Lorem ipsum dolor sit amet\n  consectetuer adipiscing elit'


class SomeTest(unittest.TestCase):
    def test_some_function(self):
        actual = some_function()

        expected = six.b(dedent("""
            Lorem ipsum dolor sit amet
              consectetuer adipiscing elit
            """)).strip()

        self.assertEqual(actual, expected)

if __name__ == '__main__':
    unittest.main()

与问题

中的要点建议类似

我可以转换为unicode，使用textwrap.dedent，然后转换回字节。但这只有在字节字符串符合某些Unicode编码时才可行。

但是你在这里误解了一些关于编码的东西 - 如果你可以在你的测试中首先编写字符串文字，并让文件成功解析python（即模块上的正确编码声明）），然后没有＆＃34;转换为unicode＆＃34;走到这里文件在指定的编码中解析（或sys.defaultencoding，如果你没有指定），然后当字符串是python变量时，它已被解码。

Answer 2

答案2：textwrap主要是关于Textwrap类和函数。 dedent列在

下

# -- Loosely related functionality --------------------

尽可能接近，仅使其成为文本（unicode str）特定的东西是文字。我用b将所有6个作为前缀并瞧！（我没有编辑任何其他内容，但应调整函数docstring。）

import re

_whitespace_only_re = re.compile(b'^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile(b'(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)

def dedent_bytes(text):
    """Remove any common leading whitespace from every line in `text`.

    This can be used to make triple-quoted strings line up with the left
    edge of the display, while still presenting them in the source code
    in indented form.

    Note that tabs and spaces are both treated as whitespace, but they
    are not equal: the lines "  hello" and "\\thello" are
    considered to have no common leading whitespace.  (This behaviour is
    new in Python 2.5; older versions of this module incorrectly
    expanded tabs before searching for common leading whitespace.)
    """
    # Look for the longest leading string of spaces and tabs common to
    # all lines.
    margin = None
    text = _whitespace_only_re.sub(b'', text)
    indents = _leading_whitespace_re.findall(text)
    for indent in indents:
        if margin is None:
            margin = indent

        # Current line more deeply indented than previous winner:
        # no change (previous winner is still on top).
        elif indent.startswith(margin):
            pass

        # Current line consistent with and no deeper than previous winner:
        # it's the new winner.
        elif margin.startswith(indent):
            margin = indent

        # Find the largest common whitespace between current line
        # and previous winner.
        else:
            for i, (x, y) in enumerate(zip(margin, indent)):
                if x != y:
                    margin = margin[:i]
                    break
            else:
                margin = margin[:len(indent)]

    # sanity check (testing/debugging only)
    if 0 and margin:
        for line in text.split(b"\n"):
            assert not line or line.startswith(margin), \
                   "line = %r, margin = %r" % (line, margin)

    if margin:
        text = re.sub(rb'(?m)^' + margin, b'', text)
    return text

print(dedent_bytes(b"""
            Lorem ipsum dolor sit amet
              consectetuer adipiscing elit
            """)
      )

# prints
b'\nLorem ipsum dolor sit amet\n  consectetuer adipiscing elit\n'

Answer 3

答案1：三重多线字符串（和dedent）是一种便利（有时），而不是必需品。您可以为每行写一个单独的字节文字，以b'\ n'结尾，然后让解析器加入它们。例如：

>>> b = (
    b'Lorem ipsum dolor sit amet\n' # first line
    b'consectetuer adipiscing elit\n' # 2nd line
    )
>>> b
b'Lorem ipsum dolor sit amet\nconsectetuer adipiscing elit\n'

我故意将空格和注释添加到结果字节中不需要的代码中，如果不包含它们的话。我有时用文本字符串做相同的操作。

答案2：将textwrap.dedent转换为处理字节（参见单独的答案）

答案3：省略b前缀并在.encode()之前或之后添加.strip()。

print(textwrap.dedent("""
            Lorem ipsum dolor sit amet
              consectetuer adipiscing elit
            """).encode())
# prints (same as Answer 2).
b'\nLorem ipsum dolor sit amet\n  consectetuer adipiscing elit\n'

在Python 3中使用带有字节的textwrap.dedent（）

3 个答案: