我有一个列表,其中每个项目都是一个句子。只要新合并的项目不超过字符数限制,我就希望加入这些项目。
您可以相当轻松地将列表中的项目加入。
x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
' '.join(x)
>>> 'Alice went to the market. She bought an apple. And she then went to the park.'
现在说,只要新合并的项目不超过50个字符,我想按顺序加入这些项目。
结果将是:
['Alice went to the market. She bought an apple.','And she then went to the park.']
您也许可以像here那样进行列表理解。或者,我可以做一个像here这样的条件迭代器。但是我遇到了句子被切断的问题。
说明
['Alice went to the market. She bought an apple.','And she then went to the park.']
答案 0 :(得分:2)
这是一种单行解决方案,只是因为有可能。
[x[i] for i in range(len(x)) if [sum(list(map(len,x))[:j+1]) for j in range(len(x))][i] < 50]
效率更高-中间结果可以节省重新计算的时间-但仍然没有显式循环。
lens = list(map(len, x))
sums = [sum(lens[:i]) for i in range(len(x))]
[x[i] for i in range(len(x)) if sums < 50]
不过,我怀疑在任何现实情况下,这种方法都比显式循环更有效!
答案 1 :(得分:1)
由于您想继续检查总长度,列表理解可能会不太清晰。
一个简单的函数即可。这个可以接受空的joined_str
或未指定为默认值,但也可以以一些指定的初始str
开头。
def join_50_chars_or_less(lst, limit=50):
"""
Takes in lst of strings and returns join of strings
up to `limit` number of chars (no substrings)
:param lst: (list)
list of strings to join
:param limit: (int)
optional limit on number of chars, default 50
:return: (list)
string elements joined up until length of 50 chars.
No partial-strings of elements allowed.
"""
for i in range(len(lst)):
new_join = lst[:i+1]
if len(' '.join(new_join)) > limit:
return lst[:i]
return lst
定义函数后:
>>> x = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
>>> join_50_chars_or_less(x)
['Alice went to the market.', 'She bought an apple.']
>>> len('Alice went to the market. She bought an apple.')
47
让我们针对可能更长的字符串进行测试:
>>> test_str = "Alice went to the market. She bought an apple on Saturday."
>>> len(test_str)
58
>>> test = test_str.split()
>>> test
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on', 'Saturday.']
>>> join_50_chars_or_less(test)
['Alice', 'went', 'to', 'the', 'market.', 'She', 'bought', 'an', 'apple', 'on']
>>> len(' '.join(join_50_chars_or_less(test)))
>>> 48
答案 2 :(得分:1)
您可以使用itertools中的累加来计算累加字符串(+分隔符)的大小,并确定可以组合的最大项目数。
之后,您可以决定将它们组合在一起,并且您还将知道哪些项目不合适。
s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
from itertools import accumulate
maxCount = sum( size+sep<=50 for sep,size in enumerate(accumulate(map(len,s))) )
combined = " ".join(s[:maxCount])
unused = s[maxCount:]
print(combined,unused)
# Alice went to the market. She bought an apple. ['And she then went to the park.']
您还可以通过更残酷(且效率低下)的方式获得maxCount,而无需使用累加:
maxCount = sum(len(" ".join(s[:n+1]))<=50 for n in range(len(s)))
或者您可以在一行中完成整个操作:
items = next(s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50 )
# ['Alice went to the market.', 'She bought an apple.']
unused = s[len(items):]
# ['And she then went to the park.']
如果您需要从列表中执行多个组合以生成新的组合句子列表(根据对问题的最新编辑),可以在循环中使用它:
combined = []
s = ['Alice went to the market.', 'She bought an apple.', 'And she then went to the park.']
while s:
items = next((s[:n] for n in range(len(s),0,-1) if len(" ".join(s[:n]))<=50), s[:1])
combined.append(" ".join(items))
s = s[len(items):]
print(combined)
# ['Alice went to the market. She bought an apple.', 'And she then went to the park.']
编辑将调用更改为next()函数以添加默认值。这将处理已经超过50个字符的句子。
答案 3 :(得分:0)
一个不太优雅的解决方案:
result = []
counter = 0
string = ""
for element in x:
for char in element:
if len(string) < 50:
string.append(char)
else:
result.append(string)
string = ""
if len(string) > 0:
result.append(string)