Question

我有一个python方法，它接受表单的元组列表（字符串，浮点数）和返回一个字符串列表，如果合并，则不会超过某个限制。我没有拆分句子以保留输出长度，但确保保持在所需输出长度的句子长度内。

例如：
s：[('Where are you',1),('What about the next day',2),('When is the next event',3)]

max_length：5
输出：'Where are you What about the next day'

max_length：3
输出：'Where are you'

这就是我在做的事情：

l=0
output = []
for s in s_tuples:
   if l <= max_length:
     output.append(s[0])
     l+=len(get_words_from(s[0]))
 return ''.join(output)

是否有更智能的方法确保输出字长不超过max_length，而不是在达到长度时停止？

Answer 1

首先，如果最大长度达到下一次迭代，我认为没有理由推迟循环的破坏。

因此，在更改代码时，我想出了以下代码：

s_tuples = [('Where are you',1),('What about the next day',2),('When is the next event',3)]


def get_words_number(s):
    return len(s.split())


def truncate(s_tuples, max_length):
    tot_len = 0
    output = []
    for s in s_tuples:
        output.append(s[0])
        tot_len += get_words_number(s[0])
        if tot_len >= max_length:
            break
    return ' '.join(output)


print truncate(s_tuples,3)

其次，我真的不喜欢创建临时对象output。我们可以使用迭代器提供join方法，该迭代器遍历初始列表而不重复信息。

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            yield s
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                break

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,3)

此外，在您的示例中，输出略大于设置的单词最大值。如果你希望单词的数量总是少于限制（但仍然是最大可能的），而不是在检查限制后放置yield：

def truncate(s_tuples, max_length):

    def stop_iterator(s_tuples):
        tot_len = 0
        for s,num in s_tuples:
            tot_len += get_words_number(s)
            if tot_len >= max_length:
                if tot_len == max_length:
                    yield s
                break
            yield s

    return ' '.join(stop_iterator(s_tuples))


print truncate(s_tuples,5)

Answer 2

max_length应该控制什么？返回列表中的单词总数？我本来期望一个max_length五个只能产生5个字，而不是8个字。

编辑：我会保留两个列表，因为我认为它很容易阅读，但有些人可能不喜欢额外的开销：

def restrictWords(givenList, whenToStop):
    outputList = []
    wordList = []
    for pair in givenList:
        stringToCheck = pair[0]
        listOfWords = stringToCheck.split()
        for word in listOfWords:
            wordList.append(word)
        outputList.append( stringToCheck )
        if len( wordList ) >= whenToStop:
            break
    return outputList

所以

testList = [ ('one two three',1),
             ('four five',2),
             ('six seven eight nine',3) ]

2应该给你['one two three'] 3应该给你['one two three'] 4应该给你['one two three', 'four five']

Answer 3

一种更聪明的方法是在超过max_length时立即突破循环，这样就不会无缘无故地遍历列表的其余部分：

for s in s_tuples:
    if l > max_length:
        break
    output.append(s[0])
    l += len(get_words_from(s[0]))
return ''.join(output)

Answer 4

当达到限制时，您的代码不会停止。 “max_length”是一个糟糕的名字......它不是“最大长度”，你的代码允许它超出（如你的第一个例子） - 这是故意的吗？ “l”是一个坏名字;我们称之为tot_len。你甚至可以在tot_len == max_length时继续前进。您的示例显示了与空格的连接，但您的代码没有这样做。

您可能需要以下内容：

tot_len = 0
output = []
for s in s_tuples:
    if tot_len >= max_length:
        break
    output.append(s[0])
    tot_len += len(get_words_from(s[0]))
return ' '.join(output)

Answer 5

如果NumPy可用，则使用列表推导的以下解决方案有效。

import numpy as np

# Get the index of the last clause to append.
s_cumlen = np.cumsum([len(s[0].split()) for s in s_tuples])
append_until = np.sum(s_cumlen < max_length)

return ' '.join([s[0] for s in s_tuples[:append_until+1]])

为清晰起见，s_cumlen包含字符串字数的累积总和。

>>> s_cumlen
array([ 3,  8, 13])

Python字符串追加

5 个答案: