如何在不影响单词的情况下限制每行的字符数?

时间:2014-05-15 20:15:45

标签: python algorithm python-3.x

我要做的是打开一个包含一些段落的文本文件,并为每行提供最多X个字符的宽度。但是,我的算法存在缺陷,因为这会削减单词并且无法正常工作。我真的不确定如何解决这个问题。我也不确定如何改变它。

我检查了textwrap,此时我并不想使用它,因为我想提高我的算法技能。

所以我的算法是打开文件:

f.open("file.txt", "r", encoding="utf-8")
lines = f.readlines()
f.close()

现在我有一个所有行的列表。这是我被卡住的地方。打印时如何限制每行的长度?

我真的不确定如何解决这个问题,并且真的很感激一些帮助。

感谢。

5 个答案:

答案 0 :(得分:4)

您可以使用标准textwrap module

import textwrap
txt = "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
print '\n'.join(textwrap.wrap(txt, 20, break_long_words=False))

首先,对于阅读文件,您应该使用with构造:

with open(filename, 'r') as f:
    lines = f.readlines()

def wrap(line):
    broken = textwrap.wrap(line, 20, break_long_words=False)
    return '\n'.join(broken)

wrapped = [wrap(line) for line in lines]

但是你说过,你不想使用内置的textwrap,而是自己动手,所以这里是无需导入的解决方案:

import textwrap

lorem = """Lorem ipsum dolor sit amet, consectetur adipiscing elit. 
Phasellus ac commodo libero, at dictum leo. Nunc convallis est id purus porta,  
malesuada erat volutpat. Cras commodo odio nulla. Nam vehicula risus id lacus 
vestibulum. Maecenas aliquet iaculis dignissim. Phasellus aliquam facilisis  
pellentesque ultricies. Vestibulum dapibus quam leo, sed massa ornare eget. 
Praesent euismod ac nulla in lobortis. 
Sed sodales tellus non semper feugiat."""

def wrapped_lines(line, width=80):
    whitespace = set(" \n\t\r")
    length = len(line)
    start = 0

    while start < (length - width):
        # we take next 'width' of characters:
        chunk = line[start:start+width+1]
        # if there is a newline in it, let's return first part
        if '\n' in chunk:
            end = start + chunk.find('\n')
            yield line[start:end]
            start = end+1 # we set new start on place where we are now
            continue

        # if no newline in chunk, let's find the first whitespace from the end
        for i, ch in enumerate(reversed(chunk)):
            if ch in whitespace:
                end = (start+width-i)
                yield line[start:end]
                start = end + 1
                break
            else: # just for readability
                continue 
    yield line[start:]

for line in wrapped_lines(lorem, 30):
    print line

编辑我不喜欢上面的版本,根据我的口味它有点丑陋和非pythonic。这是另一个:

def wrapped_lines(line, width=80):
    whitespace = set(" \n\t\r")
    length = len(line)
    start = 0

    while start < (length - width):
        end = start + width + 1
        chunk = line[start:end]
        try:
            end = start + chunk.index('\n')
        except ValueError: # no newline in chunk
            # we iterate characters from the end:
            for i, ch in enumerate(reversed(chunk)):
                if ch in whitespace:
                    end -= i # we have our end on first whitespace
                    break
        yield line[start:end]
        start = end + 1
    yield line[start:]

答案 1 :(得分:1)

程序员技能的一部分应该是阅读和理解其他人编写的源代码的能力。我知道您不想使用textwrap模块。但是,您可以从其源代码中学习。原因是你必须逆向工程还要反映出某个人头脑中问题的心理形象的部分。通过这种方式,您还可以学习如何更好地编写内容。

您可以在textwrap中找到c:\Python34\Lib\textwrap.py实施。您可以将其复制并重命名为工作目录以进行实验。

答案 2 :(得分:0)

有几种方法可以解决这个问题。一种方法是在右边距之前寻找最后一个空格并在那里分割字符串,打印第一部分并在第二部分重复搜索和分割。

这是另一种方法:将文本拆分为单词,然后逐个将单词添加到行缓冲区。如果下一个单词溢出该行,则首先打印该行并重置该行。 (作为额外的,此代码允许您指定左边距。)

def par(s, wrap = 72, margin = 0):
    """Print a word-wrapped paragraph with given width and left margin"""

    left = margin * " "
    line = ""

    for w in s.split():
        if len(line) + len(w) >= wrap:
            print left + line
            line = ""

        if line: line += " "
        line += w

    print left + line
    print



par("""What I'm trying to do is open up a text file with some
    paragraphs and give each line a maximum width of X number  of
    characters.""", 36)

par("""However, I'm having a flaw in my algorithm as this
    will cut out words and it's not going to work. I'm not really
    sure how to go about this. Also I'm not sure how to make it
    change line.""", 36, 44)

par("""I checked textwrap and I don't really want to use it at
    this point since I want to improve my algorithmic skills.""",
        64, 8)

当然,您可以使用换行符返回多行字符串,或者更好的是返回行列表,而不是打印。

答案 3 :(得分:0)

Test.txt包含:

"""
What I'm trying to do is open up a text file with some paragraphs and give each line a    maximum width of X number of characters.
However, I'm having a flaw in my algorithm as this will cut out words and it's not going to work.
I'm not really sure how to go about this. Also I'm not sure how to make it change line.
"""
with open("test.txt") as f:
    lines = f.readlines()
    max_width = 25 
    result = ""
    col = 0
    for line in lines:
        for word in line.split():
            end_col = col + len(word)
            if col != 0:
                end_col += 1
            if end_col > max_width: 
                result += '\n'
                col = 0    
            if col != 0:
                result += ' ' 
                col += 1
            result += word 
            col += len(word)
        print result


What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters.
What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters. However, I'm
having a flaw in my
algorithm as this will
cut out words and it's
not going to work.
What I'm trying to do is
open up a text file with
some paragraphs and give
each line a maximum width
of X number of
characters. However, I'm
having a flaw in my
algorithm as this will
cut out words and it's
not going to work. I'm
not really sure how to go
about this. Also I'm not
sure how to make it
change line.

答案 4 :(得分:-2)

要获得正确的方法,您需要首先确定要对长度超过定义长度的任何内容做什么。假设你想要一个相当传统的包装,额外的单词流到下一行你应该有类似的逻辑(注意 - 这是伪代码)

for(int lineCount=0; lineCount<totalLines; lineCount++){
    currentLine=lines[lineCount];
    if(currentLine.length < targetLength){
       int snipStart=currentLine.find_whitespace_before_targetLength;
       snip = currentLine.snip(snipStart, currentLine.length);
       if(lineCount<totalLines-1){
         lines[lineCount+1].prepend(snip);
       }else{
         //Add snip to line array, since the last line is too long
       }
    }
}