我有一个基于分隔符要分割的项目列表。我希望删除所有分隔符,并在分隔符出现两次时拆分列表。例如,如果分隔符为'X'
,则为以下列表:
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
会变成:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
请注意,最后一组未拆分。
我写了一些丑陋的代码来做到这一点,但我确信有更好的东西。如果你可以设置一个任意长度分隔符(即看到N个分隔符后拆分列表),可以加分。
答案 0 :(得分:13)
我不认为这会有一个很好的,优雅的解决方案(我当然希望被证明是错误的)所以我会建议一些简单明了的事情:
def nSplit(lst, delim, count=2):
output = [[]]
delimCount = 0
for item in lst:
if item == delim:
delimCount += 1
elif delimCount >= count:
output.append([item])
delimCount = 0
else:
output[-1].append(item)
delimCount = 0
return output
>>> nSplit(['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], 'X', 2)
[['a', 'b'], ['c', 'd'], ['f', 'g']]
答案 1 :(得分:4)
以下是使用itertools.groupby()
:
import itertools
class MultiDelimiterKeyCallable(object):
def __init__(self, delimiter, num_wanted=1):
self.delimiter = delimiter
self.num_wanted = num_wanted
self.num_found = 0
def __call__(self, value):
if value == self.delimiter:
self.num_found += 1
if self.num_found >= self.num_wanted:
self.num_found = 0
return True
else:
self.num_found = 0
def split_multi_delimiter(items, delimiter, num_wanted):
keyfunc = MultiDelimiterKeyCallable(delimiter, num_wanted)
return (list(item
for item in group
if item != delimiter)
for key, group in itertools.groupby(items, keyfunc)
if not key)
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
我必须说,对于相同的结果,cobbal的解决方案要简单得多。
答案 2 :(得分:4)
使用生成器函数通过列表维护迭代器的状态,以及到目前为止看到的分隔符字符数的计数:
l = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
def splitOn(ll, x, n):
cur = []
splitcount = 0
for c in ll:
if c == x:
splitcount += 1
if splitcount == n:
yield cur
cur = []
splitcount = 0
else:
cur.append(c)
splitcount = 0
yield cur
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
l += ['X','X']
print list(splitOn(l, 'X', 2))
print list(splitOn(l, 'X', 1))
print list(splitOn(l, 'X', 3))
打印:
[['a', 'b'], ['c', 'd'], ['f', 'g']]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g']]
[['a', 'b', 'c', 'd', 'f', 'g']]
[['a', 'b'], ['c', 'd'], ['f', 'g'], []]
[['a', 'b'], [], ['c', 'd'], [], ['f'], ['g'], [], []]
[['a', 'b', 'c', 'd', 'f', 'g']]
编辑:我也是groupby的忠实粉丝,这是我的目标:
from itertools import groupby
def splitOn(ll, x, n):
cur = []
for isdelim,grp in groupby(ll, key=lambda c:c==x):
if isdelim:
nn = sum(1 for c in grp)
while nn >= n:
yield cur
cur = []
nn -= n
else:
cur.extend(grp)
yield cur
与我之前的回答没有什么不同,只是让groupby负责迭代输入列表,创建分隔符匹配和非分隔符匹配字符组。不匹配的字符只是添加到当前元素上,匹配的字符组执行分解新元素的工作。对于长列表,这可能会更高效,因为groupby在C中完成所有工作,并且仍然只迭代列表一次。
答案 3 :(得分:3)
a = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
b = [[b for b in q if b != 'X'] for q in "".join(a).split("".join(['X' for i in range(2)]))]
这给出了
[['a', 'b'], ['c', 'd'], ['f', 'g']]
其中2是您想要的元素数量。最有可能采用更好的方法。
答案 4 :(得分:2)
非常难看,但是我想知道我是否可以将其作为一个单行使用,我想我会分享。我请求你不要将这个解决方案用于任何重要的事情。最后的('X', 3)
是分隔符,应该重复它的次数。
(lambda delim, count: map(lambda x:filter(lambda y:y != delim, x), reduce(lambda x, y: (x[-1].append(y) if y != delim or x[-1][-count+1:] != [y]*(count-1) else x.append([])) or x, ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])))('X', 2)
修改强>
这是一个细分。我还删除了一些冗余代码,这些代码在写出来时更加明显。 (也改为上面)
# Wrap everything in a lambda form to avoid repeating values
(lambda delim, count:
# Filter all sublists after construction
map(lambda x: filter(lambda y: y != delim, x), reduce(
lambda x, y: (
# Add the value to the current sub-list
x[-1].append(y) if
# but only if we have accumulated the
# specified number of delimiters
y != delim or x[-1][-count+1:] != [y]*(count-1) else
# Start a new sublist
x.append([]) or x,
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g'], [[]])
)
)('X', 2)
答案 5 :(得分:1)
这是一个使用zip和生成器的干净漂亮的解决方案
#1 define traditional sequence split function
#if you only want it for lists, you can use indexing to make it shorter
def split(it, x):
to_yield = []
for y in it:
if x == y:
yield to_yield
to_yield = []
else:
to_yield.append(y)
if to_yield:
yield to_yield
#2 zip the sequence with its tail
#you could use itertools.chain to avoid creating unnecessary lists
zipped = zip(l, l[1:] + [''])
#3. remove ('X',not 'X')'s from the resulting sequence, and leave only the first position of each
# you can use list comprehension instead of generator expression
filtered = (x for x,y in zipped if not (x == 'X' and y != 'X'))
#4. split the result using traditional split
result = [x for x in split(filtered, 'X')]
这样split()更可重复使用。
令人惊讶的是python没有内置的。
编辑:
您可以轻松地调整它以获得更长的分割序列,重复步骤2-3并使用l [i:]进行压缩过滤,使其为0<我< = n。
答案 6 :(得分:1)
import re
map(list, re.sub('(?<=[a-z])X(?=[a-z])', '', ''.join(lst)).split('XX'))
这是一个列表 - &gt; string - &gt;列表转换并假定非分隔符都是小写字母。
答案 7 :(得分:0)
这是另一种方法:
def split_multi_delimiter(items, delimiter, num_wanted):
def remove_delimiter(objs):
return [obj for obj in objs if obj != delimiter]
ranges = [(index, index+num_wanted) for index in xrange(len(items))
if items[index:index+num_wanted] == [delimiter] * num_wanted]
last_end = 0
for range_start, range_end in ranges:
yield remove_delimiter(items[last_end:range_start])
last_end = range_end
yield remove_delimiter(items[last_end:])
items = ['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']
print list(split_multi_delimiter(items, "X", 2))
答案 8 :(得分:0)
In [6]: input = ['a', 'b', 'X', 'X', 'cc', 'XX', 'd', 'X', 'ee', 'X', 'X', 'f']
In [7]: [s.strip('_').split('_') for s in '_'.join(input).split('X_X')]
Out[7]: [['a', 'b'], ['cc', 'XX', 'd', 'X', 'ee'], ['f']]
这假设您可以使用输入中找不到的_
等保留字符。
答案 9 :(得分:0)
太聪明了一半,只提供了因为显而易见的正确方法看起来如此蛮力和丑陋:
class joiner(object):
def __init__(self, N, data = (), gluing = False):
self.data = data
self.N = N
self.gluing = gluing
def __add__(self, to_glue):
# Process an item from itertools.groupby, by either
# appending the data to the last item, starting a new item,
# or changing the 'gluing' state according to the number of
# consecutive delimiters that were found.
N = self.N
data = self.data
item = list(to_glue[1])
# A chunk of delimiters;
# return a copy of self with the appropriate gluing state.
if to_glue[0]: return joiner(N, data, len(item) < N)
# Otherwise, handle the gluing appropriately, and reset gluing state.
a, b = (data[:-1], data[-1] if data else []) if self.gluing else (data, [])
return joiner(N, a + (b + item,))
def split_on_multiple(data, delimiter, N):
# Split the list into alternating groups of delimiters and non-delimiters,
# then use the joiner to join non-delimiter groups when the intervening
# delimiter group is short.
return sum(itertools.groupby(data, delimiter.__eq__), joiner(N)).data
答案 10 :(得分:0)
正则表达式,我选择你了!
import re
def split_multiple(delimiter, input):
pattern = ''.join(map(lambda x: ',' if x == delimiter else ' ', input))
filtered = filter(lambda x: x != delimiter, input)
result = []
for k in map(len, re.split(';', ''.join(re.split(',',
';'.join(re.split(',{2,}', pattern)))))):
result.append([])
for n in range(k):
result[-1].append(filtered.__next__())
return result
print(split_multiple('X',
['a', 'b', 'X', 'X', 'c', 'd', 'X', 'X', 'f', 'X', 'g']))
哦,你说的是Python,而不是Perl。