Python:拆分混合字符串

时间:2013-06-25 08:26:47

标签: python

我从以下形式的文件中读取了一些行:

line = a   b  c  d,e,f    g   h  i,j,k,l   m   n

我想要的是没有“,” - 分隔元素的行,例如

a   b  c  d    g   h  i   m   n 
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
.   .  .  .    .   .  .   .   .
.   .  .  .    .   .  .   .   .

首先我会拆分line

sline = line.split()

现在我将迭代sline并查找可以用“,”作为分隔符拆分的元素。问题是我不知道我所期望的那些元素总是有多少。 有什么想法吗?

6 个答案:

答案 0 :(得分:3)

你的问题不是很清楚。如果你想在逗号之后剥离任何部分(正如你的文字所示),那么一个相当可读的单行应该这样做:

cleaned_line = " ".join([field.split(",")[0] for field in line.split()])

如果您想展开包含逗号分隔字段的行到多行(如您的示例所示),那么您应该使用itertools.product函数:

import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in line.split()]
for expanded_line_fields in itertools.product(*line_fields):
    print " ".join(expanded_line_fields)

这是输出:

a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n

如果由于某种原因保持原始间距很重要,那么您可以将line.split()替换为re.findall("([^ ]*| *)", line)

import re
import itertools
line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
line_fields = [field.split(",") for field in re.findall("([^ ]+| +)", line)]
for expanded_line_fields in itertools.product(*line_fields):
    print "".join(expanded_line_fields)

这是输出:

a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n

答案 1 :(得分:3)

使用regexitertools.product和一些字符串格式:

此解决方案也保留了初始间距。

>>> import re
>>> from itertools import product
>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
>>> for prod in product(*items):
...     print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n
a   b  c  d    g   h  j   m   n
a   b  c  d    g   h  k   m   n
a   b  c  d    g   h  l   m   n
a   b  c  e    g   h  i   m   n
a   b  c  e    g   h  j   m   n
a   b  c  e    g   h  k   m   n
a   b  c  e    g   h  l   m   n
a   b  c  f    g   h  i   m   n
a   b  c  f    g   h  j   m   n
a   b  c  f    g   h  k   m   n
a   b  c  f    g   h  l   m   n

另一个例子:

>>> line = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n q,w,e,r  f o   o'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
for prod in product(*items):
    print (strs.format(*prod))
...     
a   b  c  d    g   h  i   m   n q  f o   o
a   b  c  d    g   h  i   m   n w  f o   o
a   b  c  d    g   h  i   m   n e  f o   o
a   b  c  d    g   h  i   m   n r  f o   o
a   b  c  d    g   h  j   m   n q  f o   o
a   b  c  d    g   h  j   m   n w  f o   o
a   b  c  d    g   h  j   m   n e  f o   o
a   b  c  d    g   h  j   m   n r  f o   o
a   b  c  d    g   h  k   m   n q  f o   o
a   b  c  d    g   h  k   m   n w  f o   o
a   b  c  d    g   h  k   m   n e  f o   o
a   b  c  d    g   h  k   m   n r  f o   o
a   b  c  d    g   h  l   m   n q  f o   o
a   b  c  d    g   h  l   m   n w  f o   o
a   b  c  d    g   h  l   m   n e  f o   o
a   b  c  d    g   h  l   m   n r  f o   o
a   b  c  e    g   h  i   m   n q  f o   o
a   b  c  e    g   h  i   m   n w  f o   o
a   b  c  e    g   h  i   m   n e  f o   o
a   b  c  e    g   h  i   m   n r  f o   o
a   b  c  e    g   h  j   m   n q  f o   o
a   b  c  e    g   h  j   m   n w  f o   o
a   b  c  e    g   h  j   m   n e  f o   o
a   b  c  e    g   h  j   m   n r  f o   o
a   b  c  e    g   h  k   m   n q  f o   o
a   b  c  e    g   h  k   m   n w  f o   o
a   b  c  e    g   h  k   m   n e  f o   o
a   b  c  e    g   h  k   m   n r  f o   o
a   b  c  e    g   h  l   m   n q  f o   o
a   b  c  e    g   h  l   m   n w  f o   o
a   b  c  e    g   h  l   m   n e  f o   o
a   b  c  e    g   h  l   m   n r  f o   o
a   b  c  f    g   h  i   m   n q  f o   o
a   b  c  f    g   h  i   m   n w  f o   o
a   b  c  f    g   h  i   m   n e  f o   o
a   b  c  f    g   h  i   m   n r  f o   o
a   b  c  f    g   h  j   m   n q  f o   o
a   b  c  f    g   h  j   m   n w  f o   o
a   b  c  f    g   h  j   m   n e  f o   o
a   b  c  f    g   h  j   m   n r  f o   o
a   b  c  f    g   h  k   m   n q  f o   o
a   b  c  f    g   h  k   m   n w  f o   o
a   b  c  f    g   h  k   m   n e  f o   o
a   b  c  f    g   h  k   m   n r  f o   o
a   b  c  f    g   h  l   m   n q  f o   o
a   b  c  f    g   h  l   m   n w  f o   o
a   b  c  f    g   h  l   m   n e  f o   o
a   b  c  f    g   h  l   m   n r  f o   o

答案 2 :(得分:1)

如果我已正确理解您的示例您需要关注

import itertools
sss = "a   b  c  d,e,f    g   h  i,j,k,l   m   n  d,e,f "
coma_separated = [i for i in sss.split() if ',' in i]
spited_coma_separated = [i.split(',') for i in coma_separated]
symbols = (i for i in itertools.product(*spited_coma_separated)) 
                     #use generator statement to save memory
for s in symbols:
    st = sss
    for part, symb in zip(coma_separated, s):
        st = st.replace(part, symb, 1) # To prevent replacement of the 
                                       # same coma separated group replace once 
                                       # for first occurance
    print (st.split()) # for python3 compatibility

答案 3 :(得分:1)

大多数其他答案只生成一行而不是您想要的多行。

为了实现您的目标,您可以通过多种方式开展工作。

递归解决方案对我来说似乎最直观:

def dothestuff(l):
    for n, i in enumerate(l):
        if ',' in i:
            # found a "," entry
            items = i.split(',')
            for j in items:
                for rest in dothestuff(l[n+1:]):
                    yield l[:n] + [j] + rest
            return
    yield l


line = "a   b  c  d,e,f    g   h  i,j,k,l   m   n"
for i in dothestuff(line.split()): print i

答案 4 :(得分:0)

for i in range(len(line)-1):
    if line[i] == ',':
        line = line.replace(line[i]+line[i+1], '')

答案 5 :(得分:0)

import itertools
line_data = 'a   b  c  d,e,f    g   h  i,j,k,l   m   n'
comma_fields_indices = [i for i,val in enumerate(line_data.split()) if "," in val]
comma_fields = [i.split(",") for i in line_data.split() if "," in i]
all_comb = []
for val in itertools.product(*comma_fields):
    sline_data = line_data.split()
    for index,word in enumerate(val):
        sline_data[comma_fields_indices[index]] = word
    all_comb.append(" ".join(sline_data))
print all_comb