Question

我编写了一个简短的perl代码，将一长串文本分成40个字符或更低的字符（通过在它们之间插入分隔线，但不在中间分割单词）。

text=~ s!(.{0,40})\s+!$1\n!g;

python 3相当于什么？非常感谢。

解决方案：

answer=re.sub(r'(.{0,40})\s', r'\1\n', my_text+" "

我更改了下面推荐的解决方案，以避免在单词中插入分隔线。

编辑：我在文本的末尾添加了一个空格，以避免如果用户在他们的结尾没有提供断行线（或其他类型的空格）时将最后一个单词放在它自己的行中文本。

Answer 1

Python附带电池：使用textwrap.fill：

In [15]: import textwrap

In [16]: print(textwrap.fill('This is a very long string with spaces or maybewithoutspaces', width=40))
This is a very long string with spaces
or maybewithoutspaces

In [17]: print(textwrap.fill('Thisisaverylongstringwithspacesormaybewithoutspaces', width=40))
Thisisaverylongstringwithspacesormaybewi
thoutspaces

但请注意，textwrap.fill试图打破空间。与您的Perl代码更相似的文字是：

text = re.sub(r'(.{0,40})', r'\1\n', text)

例如，

In [18]: import re

In [19]: print(re.sub(r'(.{0,40})', r'\1\n', 'This is a very long string with spaces or maybewithoutspaces'))
This is a very long string with spaces o
r maybewithoutspaces

In [22]: print(re.sub(r'(.{0,40})', r'\1\n', 'Thisisaverylongstringwithspacesormaybewithoutspaces'))
Thisisaverylongstringwithspacesormaybewi
thoutspaces

Answer 2

您可以使用textwrap模块

import textwrap
txt = 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaafdafsdafdafaddafsfdsf'

print(textwrap.fill(txt, width=40))

输出

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaafdafsdafdafaddafsfdsf

Answer 3

或者只使用切片

N = 40
s = '-'*(4*N+N//2)

print(*(s[i:i + N] for i in range(0, len(s), N)), sep="\n")

或导入more_itertools之后：

from more_itertools import sliced
print(*sliced(s, 40), sep="\n")

编辑：快速比较

N = 40
s = '-'*(8000000*40+40//2)


@timeit
def slicer1(s):
    return "\n".join(s[i:i + N] for i in range(0, len(s), N))

@timeit
def slicer2(s):
    return "\n".join(sliced(s, N))

@timeit
def slicer3(s):
    return re.sub(r'(.{0,'+str(N)+'})', r'\1\n', s)


slicer1(s)
slicer2(s)
slicer3(s)

方法2应该足够快并且是最简单的方法：

Function slicer1(s), took: 1.9553 seconds.
Function slicer2(s), took: 2.9460 seconds.
Function slicer3(s), took: 12.6048 seconds.

如何“pythonize”这个perl替换正则表达式？

3 个答案: