加入不超过最大字符数的句子列表

时间:2019-05-31 21:27:07

标签: python

我有一个列表,其中每个项目都是一个句子。只要新合并的项目不超过字符数限制,我就希望加入这些项目。

您可以相当轻松地将列表中的项目加入。

x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
' '.join(x)
>>> 'Alice went to the market. She bought an apple. And she then went to the park.'

现在说,只要新合并的项目不超过50个字符,我想按顺序加入这些项目。

结果将是:

['Alice went to the market. She bought an apple.','And she then went to the park.']

您也许可以像here那样进行列表理解。或者,我可以做一个像here这样的条件迭代器。但是我遇到了句子被切断的问题。

说明

  • 最大字符限制是指列表中单个项目的长度,而不是整个列表的长度。列表项合并后,新列表中的任何一项都不能超过限制。
  • 无法合并的项目将保持不变,并在列表中返回。
  • 将句子组合在一起,只要它们不超过限制即可。如果超过限制,请勿合并并保持原样。仅合并列表中顺序相邻的句子。
  • 请确保您的解决方案满足上述输出结果: ['Alice went to the market. She bought an apple.','And she then went to the park.']

4 个答案:

答案 0 :(得分:2)

这是一种单行解决方案,只是因为有可能。

[x[i] for i in range(len(x)) if [sum(list(map(len,x))[:j+1]) for j in range(len(x))][i] < 50]

效率更高-中间结果可以节省重新计算的时间-但仍然没有显式循环。

lens = list(map(len, x)) 
sums = [sum(lens[:i]) for i in range(len(x))]
[x[i] for i in range(len(x)) if sums < 50]

不过,我怀疑在任何现实情况下,这种方法都比显式循环更有效!

答案 1 :(得分:1)

由于您想继续检查总长度,列表理解可能会不太清晰。

一个简单的函数即可。这个可以接受空的joined_str或未指定为默认值,但也可以以一些指定的初始str开头。

def join_50_chars_or_less(lst, limit=50):
    """
    Takes in lst of strings and returns join of strings
    up to `limit` number of chars (no substrings)

    :param lst: (list)
        list of strings to join
    :param limit: (int)
        optional limit on number of chars, default 50
    :return: (list)
        string elements joined up until length of 50 chars.
        No partial-strings of elements allowed.
    """
    for i in range(len(lst)):
        new_join = lst[:i+1]
        if len(' '.join(new_join)) > limit:
            return lst[:i]
    return lst

定义函数后:

>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> join_50_chars_or_less(x)
['Alice went to the market.', 'She bought an apple.']
>>> len('Alice went to the market. She bought an apple.')
47

让我们针对可能更长的字符串进行测试:

>>> test_str = "Alice went to the market. She bought an apple on Saturday."
>>> len(test_str)
58

>>> test = test_str.split()
>>> test
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on', 'Saturday.']

>>> join_50_chars_or_less(test)
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on']
>>> len(' '.join(join_50_chars_or_less(test)))
>>> 48

答案 2 :(得分:1)

您可以使用itertools中的累加来计算累加字符串(+分隔符)的大小,并确定可以组合的最大项目数。

之后,您可以决定将它们组合在一起,并且您还将知道哪些项目不合适。

s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']

from itertools import accumulate
maxCount = sum( size+sep<=50 for sep,size in enumerate(accumulate(map(len,s))) )
combined = " ".join(s[:maxCount])
unused   = s[maxCount:]

print(combined,unused)
# Alice went to the market. She bought an apple. ['And she then went to the park.']                    

您还可以通过更残酷(且效率低下)的方式获得maxCount,而无需使用累加:

maxCount = sum(len(" ".join(s[:n+1]))<=50 for n in range(len(s)))

或者您可以在一行中完成整个操作:

items = next(s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50 )

# ['Alice went to the market.', 'She bought an apple.']

unused = s[len(items):]

# ['And she then went to the park.']

如果您需要从列表中执行多个组合以生成新的组合句子列表(根据对问题的最新编辑),可以在循环中使用它:

combined = []
s        = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
while s:
    items = next((s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50), s[:1])
    combined.append(" ".join(items))
    s = s[len(items):]

print(combined)
# ['Alice went to the market. She bought an apple.', 'And she then went to the park.'] 

编辑将调用更改为next()函数以添加默认值。这将处理已经超过50个字符的句子。

答案 3 :(得分:0)

一个不太优雅的解决方案:

result = []
counter = 0
string = ""
for element in x:
    for char in element:
        if len(string) < 50:
            string.append(char)
        else:
            result.append(string)
            string = ""
if len(string) > 0:
    result.append(string)