Question

我正在寻找将python中的文本文件读入的字符串重新格式化为最多给定长度的最佳方法，而不会破坏单词。我使用了TextWrap函数。它适用于所有情况，除非正在读入的文本包含换行符，即它包含段落。 textwrapper函数不保留这些换行符，这是一个问题。以下是我的代码：

f = open(inFile,'r') #read in text file
lines = f.read()
f.close()

paragraph = textwrap.wrap(lines, width=wid) #format paragraph


f = open(outFile, 'w') #write to file
for i in paragraph:
    print(i, file = f)
f.close()

我的一个想法就是一次一行地将格式化文本打印到输出文件中，唯一的问题是我不知道如何测试该行是否为换行符？

任何建议都将受到高度赞赏。

更新：在使用Ooga的建议之后，正确地保留了换行符，但这给我留下了最后一个问题，实际行似乎存在问题以及每行中放入了哪些数据。看看输入，预期输出和实际看看我的意思。

INPUT：

log2(N) is about the expected number of probes in an average
successful search, and the worst case is log2(N), just one more
probe. If the list is empty, no probes at all are made. Thus binary
search is a logarithmic algorithm and executes in O(logN) time. In
most cases it is considerably faster than a linear search. It can
be implemented using iteration, or recursion. In some languages it
is more elegantly expressed recursively; however, in some C-based
languages tail recursion is not eliminated and the recursive
version requires more stack space.

预期输出

log2(N) is about the expected number of
probes in an average successful search,
and the worst case is log2(N), just one
more probe. If the list is empty, no 
probes at all are made. Thus binary 
search is a logarithmic algorithm and 
executes in O(logN) time. In most cases
it is considerably faster than a linear 
search. It can be implemented using 
iteration, or recursion. In some 
languages it is more elegantly expressed
recursively; however, in some C-based
languages tail recursion is not
eliminated and the recursive version
requires more stack space.

实际输出：

log2(N) is about the expected number of
probes in an average
successful search, and the worst case is
log2(N), just one more
probe. If the list is empty, no probes
at all are made. Thus binary
search is a logarithmic algorithm and
executes in O(logN) time. In
most cases it is considerably faster
than a linear search. It can
be implemented using iteration, or
recursion. In some languages it
is more elegantly expressed recursively;
however, in some C-based
languages tail recursion is not
eliminated and the recursive
version requires more stack space.

只是为了确认这只是一个段落，因为现在正在保留新行。如何使输出与预期输出相匹配？

Answer 1

from textwrap import wrap

with open(inFile) as inf:
    lines = [line for para in inf for line in wrap(para, wid)]

with open(outFile, "w") as outf:
    outf.write("\n".join(lines))

Answer 2

您可以一次读取一行文件。

import textwrap

inFile = 'testIn.txt'
outFile = 'testOut.txt'
wid = 20

fin = open(inFile,'r')
fout = open(outFile, 'w')

for lineIn in fin:
  paragraph = textwrap.wrap(lineIn, width=wid)
  if paragraph:
    for lineOut in paragraph:
      print(lineOut, file=fout)
  else:
    print('', file=fout)

fout.close()
fin.close()

在python中重新格式化一个字符串

2 个答案: