当我在Python中使用三引号多行字符串时,我倾向于使用textwrap.dedent来保持代码的可读性,并且有很好的缩进:
some_string = textwrap.dedent("""
First line
Second line
...
""").strip()
但是,在Python 3.x中,textwrap.dedent似乎不能使用字节字符串。我在为返回长多行字节字符串的方法编写单元测试时遇到了这种情况,例如:
# The function to be tested
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
# Unit test
import unittest
import textwrap
class SomeTest(unittest.TestCase):
def test_some_function(self):
self.assertEqual(some_function(), textwrap.dedent(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").strip())
if __name__ == '__main__':
unittest.main()
在Python 2.7.10中,上面的代码工作正常,但在Python 3.4.3中它失败了:
E
======================================================================
ERROR: test_some_function (__main__.SomeTest)
----------------------------------------------------------------------
Traceback (most recent call last):
File "test.py", line 16, in test_some_function
""").strip())
File "/usr/lib64/python3.4/textwrap.py", line 416, in dedent
text = _whitespace_only_re.sub('', text)
TypeError: can't use a string pattern on a bytes-like object
----------------------------------------------------------------------
Ran 1 test in 0.001s
FAILED (errors=1)
所以:是否有一个替代textwrap.dedent可以使用字节字符串?
答案 0 :(得分:3)
似乎dedent
不支持字节串,遗憾的是。但是,如果您想要交叉兼容的代码,我建议您利用six
库:
import sys, unittest
from textwrap import dedent
import six
def some_function():
return b'Lorem ipsum dolor sit amet\n consectetuer adipiscing elit'
class SomeTest(unittest.TestCase):
def test_some_function(self):
actual = some_function()
expected = six.b(dedent("""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")).strip()
self.assertEqual(actual, expected)
if __name__ == '__main__':
unittest.main()
与问题
中的要点建议类似我可以转换为unicode,使用textwrap.dedent,然后转换回字节。但这只有在字节字符串符合某些Unicode编码时才可行。
但是你在这里误解了一些关于编码的东西 - 如果你可以在你的测试中首先编写字符串文字,并让文件成功解析python(即模块上的正确编码声明) ),然后没有"转换为unicode"走到这里文件在指定的编码中解析(或sys.defaultencoding
,如果你没有指定),然后当字符串是python变量时,它已被解码。
答案 1 :(得分:2)
答案2:textwrap
主要是关于Textwrap
类和函数。 dedent
列在
# -- Loosely related functionality --------------------
尽可能接近,仅使其成为文本(unicode str
)特定的东西是文字。我用b
将所有6个作为前缀并瞧! (我没有编辑任何其他内容,但应调整函数docstring。)
import re
_whitespace_only_re = re.compile(b'^[ \t]+$', re.MULTILINE)
_leading_whitespace_re = re.compile(b'(^[ \t]*)(?:[^ \t\n])', re.MULTILINE)
def dedent_bytes(text):
"""Remove any common leading whitespace from every line in `text`.
This can be used to make triple-quoted strings line up with the left
edge of the display, while still presenting them in the source code
in indented form.
Note that tabs and spaces are both treated as whitespace, but they
are not equal: the lines " hello" and "\\thello" are
considered to have no common leading whitespace. (This behaviour is
new in Python 2.5; older versions of this module incorrectly
expanded tabs before searching for common leading whitespace.)
"""
# Look for the longest leading string of spaces and tabs common to
# all lines.
margin = None
text = _whitespace_only_re.sub(b'', text)
indents = _leading_whitespace_re.findall(text)
for indent in indents:
if margin is None:
margin = indent
# Current line more deeply indented than previous winner:
# no change (previous winner is still on top).
elif indent.startswith(margin):
pass
# Current line consistent with and no deeper than previous winner:
# it's the new winner.
elif margin.startswith(indent):
margin = indent
# Find the largest common whitespace between current line
# and previous winner.
else:
for i, (x, y) in enumerate(zip(margin, indent)):
if x != y:
margin = margin[:i]
break
else:
margin = margin[:len(indent)]
# sanity check (testing/debugging only)
if 0 and margin:
for line in text.split(b"\n"):
assert not line or line.startswith(margin), \
"line = %r, margin = %r" % (line, margin)
if margin:
text = re.sub(rb'(?m)^' + margin, b'', text)
return text
print(dedent_bytes(b"""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""")
)
# prints
b'\nLorem ipsum dolor sit amet\n consectetuer adipiscing elit\n'
答案 2 :(得分:0)
答案1:三重多线字符串(和dedent)是一种便利(有时),而不是必需品。您可以为每行写一个单独的字节文字,以b'\ n'结尾,然后让解析器加入它们。例如:
>>> b = (
b'Lorem ipsum dolor sit amet\n' # first line
b'consectetuer adipiscing elit\n' # 2nd line
)
>>> b
b'Lorem ipsum dolor sit amet\nconsectetuer adipiscing elit\n'
我故意将空格和注释添加到结果字节中不需要的代码中,如果不包含它们的话。我有时用文本字符串做相同的操作。
答案2:将textwrap.dedent转换为处理字节(参见单独的答案)
答案3:省略b
前缀并在.encode()
之前或之后添加.strip()
。
print(textwrap.dedent("""
Lorem ipsum dolor sit amet
consectetuer adipiscing elit
""").encode())
# prints (same as Answer 2).
b'\nLorem ipsum dolor sit amet\n consectetuer adipiscing elit\n'