我目前正在使用这样的数据结构:
['t','h','i','s',' ','i','s',' ','q','u','e','r','y',' ','i','t','e','m',' ','1','t','h','i','s',' ','i','s',' ','q','u','e','r','y',' ','i','t','e','m',' ','2', ['t','h','i','s',' ','i','s',' ','a',' ','s','u','b','q','u','e','r','y'], 't','h','i','s',' ','i','s',' ','q','u','e','r','y',' ','i','t','e','m',' ','3']
我通过使用来自SO的以下答案解析查询字符串来获得此数据集:https://stackoverflow.com/a/17141441
我解析的查询是:
(this is query item 1 this is query item 2(this is a subquery)this is query item 3)
问题在于它处理单个字符,这些字符被逐一添加到列表中。我需要回到像这样的结构:
['this is query item 1 this is query item 2', ['this is a subquery'], 'this is query item 3']
我正在尝试将其包裹在解析器函数周围,或者执行后处理步骤以将字符重新推回去。有人知道解决方案吗?
答案 0 :(得分:2)
作为后处理步骤,您可以在递归函数中使用itertools.groupby:
from itertools import groupby
data = ['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'q', 'u', 'e', 'r', 'y', ' ', 'i', 't', 'e', 'm', ' ', '1', 't', 'h',
'i', 's',
' ', 'i', 's', ' ', 'q', 'u', 'e', 'r', 'y', ' ', 'i', 't', 'e', 'm', ' ', '2',
['t', 'h', 'i', 's', ' ', 'i', 's', ' ', 'a', ' ', 's', 'u', 'b', 'q', 'u', 'e', 'r', 'y'], 't', 'h', 'i', 's',
' ', 'i', 's', ' ', 'q', 'u', 'e', 'r', 'y', ' ', 'i', 't', 'e', 'm', ' ', '3']
def join(lst):
for is_list, group in groupby(lst, key=lambda x: isinstance(x, list)):
if is_list:
yield from (list(join(value)) for value in group)
else:
yield ''.join(group)
result = list(join(data))
print(result)
输出
['this is query item 1this is query item 2', ['this is a subquery'], 'this is query item 3']
这将为列表和字符创建组,如果该组是使用内置join函数的字符之一,则以递归方式调用join函数。