尝试在多条固定长度的线上打印单个字符串并最大限度地降低成本

时间:2011-10-22 15:07:00

标签: python algorithm

我刚开始使用算法的第一个背景(我现在觉得我缺乏擅长的逻辑和推理能力)我一直在尝试将“这是一个示例文本”打印成各种行,每行最多7个字符行所以第一行将有:

this is  (no spaces left in the end so cost 0)
a  
[cost=6*6*6(The spaces left at the end of each line are cubed which will be the cost) ]
sample [cost=1*1*1]
text [cost= 3*3*3]

(Total cost = 0+216+1+27=244)

现在可以通过

进行优化
this [cost 3*3*3]
is a [cost 3*3*3]
sample [cost 1*1*1]
text [cost 3*3*3]

[Total cost = 27+27+1+27 = 82]

很明显我们不能在这里使用贪婪的方法而是使用动态编程,但我的问题是我无法弄清楚将被重用的子结构。我真的很想弄清楚如何将成本条件与python中的打印联系起来,我可以为每个单词编制索引,我可以得到每个单词的长度,有点像我接下来做的那样当打印所有发生的事情就是整个字符串每行打印一行(这是我到目前为止的地方)。 如果这是一个非常愚蠢的问题我很抱歉,但我很困难,真的需要一些帮助。 感谢


这就是我尝试实现代码的方法,虽然我尝试在代码上运行一些测试,测试是由我的朋友写的,我不认为我做对了任何帮助或建议表示赞赏 print_test.py

 import os
 import sys
 from glob import glob

  #TODO -- replace this with your solution 
 from printing import print_neatly

 log = open('output.log', 'w')

 #This tests the code against my own text
 maxline = 80
 for source in glob('*.txt'):
 with open(source) as f:
    fulltext = f.read()

 words = fulltext.split()
 (cost, text) = print_neatly(words, maxline)

 #double check the cost
 #lines = text.split('\n')
 truecost = 0
 for line in text[0:-1]:
    truecost += (maxline - len(line))**3


   #print the output and cost
   print >>log, '----------------------'
   print >>log, source
   print >>log, '----------------------'
   print >>log, text
   print >>log, '----------------------'
   print >>log, 'cost = ', cost
   print >>log, 'true cost = ', truecost
   print >>log, '----------------------'


log.close()

#print the log
with open('output.log') as f: print f.read()

printing.py

def print_neatly(wordlist, max):
   #strings='This is a sample string'

   #splitting the string and taking out words from it 
   #wordlist=strings.split()
   (cost, dyn_print) = print_line(wordlist, len(wordlist), max)
   for dyn in dyn_print:
      print dyn
   return cost, dyn_print

 def cost(lines, max):

    return sum([(max-len(x)) ** 3 for x in lines])

 def print_line(wordlist, count, max, results = {}):
  results = [([],0)]
  for count in range(1, len(wordlist) + 1):
    best = wordlist[:count]               
    best_cost = cost(best, max)
    mycount = count - 1
    line = wordlist[mycount]       
    while len(line) <= max: 
        attempt, attempt_cost = results[mycount]
        attempt = attempt + [line]
        attempt_cost += cost([line],max)
        if attempt_cost < best_cost:
            best = attempt
            best_cost = attempt_cost
        if mycount > 0:
            mycount -= 1
            line = wordlist[mycount] + ' ' + line
        else:
            break
    results += [(best, best_cost)]

 #print best
 #print best_cost
 return (best_cost, best)


#print_neatly(0,7)

需要测试的文本文件给我这个输出,这里两个成本需要相同,我没有得到,可以有人指出我出错的地方


费用= 16036

真实成本= 15911

3 个答案:

答案 0 :(得分:2)

一旦方法是列出所有可能的替代方案并选择成本最低的方案:

from functools import wraps

def cache(origfunc):
    d = {}
    @wraps(origfunc)
    def wrapper(*args):
        if args in d:
            return d[args]
        result = origfunc(*args)
        d[args] = result
        return result
    return wrapper

@cache
def alternatives(t, m=7):
    ''' Given a tuple of word lengths and a maximum line length,
        return a list of all possible line groupings
        showing the total length of each line.

        >>> alternatives((4, 2, 1, 3), 7)
        [[4, 2, 1, 3], [4, 2, 5], [4, 4, 3], [7, 1, 3], [7, 5]]

    '''
    if not t:
        return []
    alts = []
    s = 0
    for i, x in enumerate(t):
        s += x
        if s > m:
            break
        tail = t[i+1:]
        if not tail:
            alts.append([s])
            break
        for subalt in alternatives(tail, m):
            alts.append([s] + subalt)
        s += 1
    return alts

def cost(t, m=7):
    ''' Evaluate the cost of lines given to line lengths

            >>> cost((7, 1, 6, 4), m=7)  # 'this is', 'a', 'sample', 'text'
            244
            >>> cost((4, 4, 6, 4))       # 'this', 'is a', 'sample', 'text'
            82

    '''
    return sum((m - x) ** 3 for x in t)

def textwrap(s, m=7):
    ''' Given a string, result a list of strings with optimal line wrapping

        >>> print textwrap('This is a sample text', 7)
        ['This', 'is a', 'sample', 'text']

    '''
    words = s.split()
    t = tuple(map(len, words))
    lengths = min(alternatives(t, m), key=cost)
    result = []
    worditer = iter(words)
    for length in lengths:
        line = []
        s = 0
        while s < length:
            word = next(worditer)
            line.append(word)
            s += len(word) + 1
        result.append(' '.join(line))
    return result


if __name__ == '__main__':
    import doctest
    print doctest.testmod()

通过限制备选搜索的数量(可能仅限于每行上三个最长的备选方案),可以加快代码的速度。

答案 1 :(得分:2)

如果有一种“最佳”的方式将一个单词,两个单词等排列成行,那么根据后来的行将不会改变。它可以根据以后的单词进行更改,如果这些单词足够小,可以在一行上加入其他单词。但是,如果我们孤立地采用这些词并尝试将它们排成一行,那么同一组解决方案将始终是最佳的。 (可能有相同的答案;例如,根据标准,7-char线上的“帽子里的猫”有两个解决方案。两者都是“最好的”,而且总是会 - 并且我们可以决定任何一个并坚持使用它没有牺牲正确性。)

  • "This"始终是最好的["This"]。 (注意,我并不是说它本身总是最好的!我所说的是,如果你有一个单词,安排它的唯一最佳方法是在一行。)

    < / LI>
  • "This is"可以安排为["This", "is"]["This is"]。然而,后者是最好的。所以从这里开始,每当我们只考虑这两个词时,我们可以完全忽略[“This”,“is”] - 它永远不会优越。

  • "This is a"可以安排为 ["This", "is", "a"] ["This is", "a"]["This", "is a"]。 (我们已经知道["This is"]优于["This", "is"] - 请参阅上一个要点!)原来["This", "is a"]是最好的。所以我们可以从这里忽略[“这是”,“a”]。

  • "This is a sample"可以安排为:

    • ["This", "is", "a", "sample"] (参见子弹#2 - 我们甚至不用看这个)
    • ["This is", "a", "sample"] (见子弹#3)
    • ["This", "is a", "sample"]

我不懂Python;我只是一起砍掉了这个。如果它是“非Pythonic”或其他什么,请原谅我。 :P

def cost(lines, limit):
    # figures the cost of the current arrangement of words in lines.
    return sum([(limit-len(x)) ** 3 for x in lines])


def lineify(words, limit):
    # splits up words into lines of at most (limit) chars.
    # should find an optimal solution, assuming all words are < limit chars long

    results = [([], 0)]

    for count in range(1, len(words) + 1):
        best = words[:count]         # (start off assuming one word per line)
        best_cost = cost(best, limit)
        mycount = count - 1
        line = words[mycount]        # start with one word

        while len(line) <= limit:
            # figure the optimal cost, assuming the other words are on another line
            attempt, attempt_cost = results[mycount]
            attempt = attempt + [line]
            attempt_cost += cost([line],limit)
            # print attempt
            if attempt_cost < best_cost:
                best = attempt
                best_cost = attempt_cost

            # steal another word.  if there isn't one, we're done
            if mycount > 0:
                mycount -= 1
                line = words[mycount] + ' ' + line
            else:
                break

        # once we have an optimal result for (count) words, save it for posterity
        results += [(best, best_cost)]

    return results[len(words)][0]


def wrap(phrase, limit):
    # helper function...so the caller doesn't have to pass an array of words.
    # they shouldn't need to know to do that
    words = phrase.split()
    return lineify(words, limit)

我最初有一个递归解决方案,但事实证明Python对递归设置了一些限制,这使得当适当大小的文本和真实世界长度限制发挥作用时它不适合。 (在任何事情被记忆之前,你必须一直回溯到开头,如果我有超过1000个单词,我最终会达到递归限制。这可以通过从足够的单词开始填充最后一行来扩展,但它'仍然将最大值限制为原始限制的某个倍数。)我发现自己使用黑客来建立结果,直到递归限制不再是问题。但是,如果必须这样做,那可能表明递归本身是一个问题。

答案 2 :(得分:0)

该算法依赖于以下假设:如果我们知道文本中N-1,N-2,...,2,1最后一个词的最优解,则很容易构造N个词的最优解。记忆允许避免重新计算best_partition()调用相同输入的结果:

import functools

def wrap(text, width):
    """
    >>> wrap('This is a sample text', 7)
    ['This', 'is a', 'sample', 'text']
    """
    return [' '.join(line) for line in best_partition(
        tuple(text.split()), functools.partial(cost, width=width))]

def best_partition(words, cost):
    """The best partition of words into lines according to the cost function."""
    best = [words] # start with all words on a single line
    for i in reversed(range(1, len(words))): # reverse to avoid recursion limit
        lines = [words[:i]] + best_partition(words[i:], cost)
        if cost(lines) < cost(best):
            best = lines
    return best

def memoize(func):
    cache = {}
    @functools.wraps(func)
    def wrapper(*args):
        try: return cache[args]
        except KeyError:
            ret = cache[args] = func(*args)
            return ret
    return wrapper

best_partition = memoize(best_partition)

cost()的位置:

def linelen(words):
    """Number of characters in a line created from words."""
    if not words: return 0
    # words + spaces between them
    return sum(map(len, words)) + len(words) - 1

def cost(lines, width):
    """
    - each line except last costs `(width - w)**3`, where `w` is the
      line width

    - cost is infinite if `w > width` and the line has more than one word

    >>> cost([['a'], ['b']], 1)
    0
    >>> cost([['a','b']], 1)
    inf
    >>> cost([['a'], ['b']], 3)
    8
    >>> cost([['a', 'b']], 2)
    inf
    """
    if not lines: return 0
    s = 0
    for i, words in enumerate(lines, 1):
        w = linelen(words)
        if width >= w:
            if i != len(lines): # last line has zero cost
                s += (width - w)**3
        elif len(words) != 1: # more than one word in the line
            return float("inf") # penalty for w > width
    return s

实施例

print('\n'.join(wrap("""
    In olden times when wishing still helped one, there lived a king whose
    daughters were all beautiful, but the youngest was so beautiful that
    the sun itself, which has seen so much, was astonished whenever it
    shone in her face. Close by the king's castle lay a great dark forest,
    and under an old lime-tree in the forest was a well, and when the day
    was very warm, the king's child went out into the forest and sat down
    by the side of the cool fountain, and when she was bored she took a
    golden ball, and threw it up on high and caught it, and this ball was
    her favorite plaything.
    """, int(sys.argv[1]) if len(sys.argv) > 1 else 70)))

输出

In olden times when wishing still helped one, there lived a king whose
daughters were all beautiful, but the youngest was so beautiful that
the sun itself, which has seen so much, was astonished whenever it
shone in her face. Close by the king's castle lay a great dark forest,
and under an old lime-tree in the forest was a well, and when the day
was very warm, the king's child went out into the forest and sat down
by the side of the cool fountain, and when she was bored she took a
golden ball, and threw it up on high and caught it, and this ball was
her favorite plaything.