我想将一个id列表加入一个字符串,其中每个id由一个' OR'分隔开。在python中我可以用
来做到这一点' OR '.join(list_of_ids)
我想知道是否有办法防止此字符串变得太大(以字节为单位)。这对我来说很重要的原因是我在API中使用该字符串,并且该API强制最大长度为4094字节。 我的解决方案如下,我只是想知道是否有更好的解决方案?
list_of_query_strings = []
substring = list_of_ids[0]
list_of_ids.pop(0)
while list_of_ids:
new_addition = ' OR ' + list_of_ids[0]
if sys.getsizeof(substring + new_addition) < 4094:
substring += new_addition
else:
list_of_query_strings.append(substring)
substring = list_of_ids[0]
list_of_ids.pop(0)
list_of_query_strings.append(substring)
答案 0 :(得分:3)
只是为了好玩,一个过度设计的解决方案(避免Schlemiel the Painter重复连接算法,允许您使用str.join
进行有效组合):
from itertools import count, groupby
class CumulativeLengthGrouper:
def __init__(self, joiner, maxblocksize):
self.joinerlen = len(joiner)
self.maxblocksize = maxblocksize
self.groupctr = count()
self.curgrp = next(self.groupctr)
# Special cases initial case to cancel out treating first element
# as requiring joiner, without requiring per call special case
self.accumlen = -self.joinerlen
def __call__(self, newstr):
self.accumlen += self.joinerlen + len(newstr)
# If accumulated length exceeds block limit...
if self.accumlen > self.maxblocksize:
# Move to new group
self.curgrp = next(self.groupctr)
self.accumlen = len(newstr)
return self.curgrp
有了这个,你use itertools.groupby
将你的iterable分解为预先调整大小的组,然后join
将它们分解为不重复的连接:
mystrings = [...]
myblocks = [' OR '.join(grp) for _, grp in
groupby(mystrings, key=CumulativeLengthGrouper(' OR ', 4094)]
如果目标是使用指定的编码生成具有给定字节大小的字符串,则可以调整CumulativeLengthGrouper
以接受第三个构造函数参数:
class CumulativeLengthGrouper:
def __init__(self, joiner, maxblocksize, encoding='utf-8'):
self.encoding = encoding
self.joinerlen = len(joiner.encode(encoding))
self.maxblocksize = maxblocksize
self.groupctr = count()
self.curgrp = next(self.groupctr)
# Special cases initial case to cancel out treating first element
# as requiring joiner, without requiring per call special case
self.accumlen = -self.joinerlen
def __call__(self, newstr):
newbytes = newstr.encode(encoding)
self.accumlen += self.joinerlen + len(newbytes)
# If accumulated length exceeds block limit...
if self.accumlen > self.maxblocksize:
# Move to new group
self.curgrp = next(self.groupctr)
self.accumlen = len(newbytes)
return self.curgrp
答案 1 :(得分:1)
这是比现有解决方案更简单的解决方案:
list_of_query_strings = []
one_string = list_of_ids[0]
# Iterate over each id
for id_ in list_of_ids[1:]:
# Add the id to the substring if it doesn't make it to large
if len(one_string) + len(id_) + 4 < 4094:
one_string += ' OR ' + id_
# Substring too large, so add to the list and reset
else:
list_of_query_strings.append(one_string)
one_string = id_