我从以下形式的文件中读取了一些行:
line = a b c d,e,f g h i,j,k,l m n
我想要的是没有“,” - 分隔元素的行,例如
a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
. . . . . . . . .
. . . . . . . . .
首先我会拆分line
sline = line.split()
现在我将迭代sline
并查找可以用“,”作为分隔符拆分的元素。问题是我不知道我所期望的那些元素总是有多少。
有什么想法吗?
答案 0 :(得分:3)
你的问题不是很清楚。如果你想在逗号之后剥离任何部分(正如你的文字所示),那么一个相当可读的单行应该这样做:
cleaned_line = " ".join([field.split(",")[0] for field in line.split()])
如果您想展开包含逗号分隔字段的行到多行(如您的示例所示),那么您应该使用itertools.product
函数:
import itertools
line = "a b c d,e,f g h i,j,k,l m n"
line_fields = [field.split(",") for field in line.split()]
for expanded_line_fields in itertools.product(*line_fields):
print " ".join(expanded_line_fields)
这是输出:
a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n
如果由于某种原因保持原始间距很重要,那么您可以将line.split()
替换为re.findall("([^ ]*| *)", line)
:
import re
import itertools
line = "a b c d,e,f g h i,j,k,l m n"
line_fields = [field.split(",") for field in re.findall("([^ ]+| +)", line)]
for expanded_line_fields in itertools.product(*line_fields):
print "".join(expanded_line_fields)
这是输出:
a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n
答案 1 :(得分:3)
使用regex
,itertools.product
和一些字符串格式:
此解决方案也保留了初始间距。
>>> import re
>>> from itertools import product
>>> line = 'a b c d,e,f g h i,j,k,l m n'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
>>> for prod in product(*items):
... print (strs.format(*prod))
...
a b c d g h i m n
a b c d g h j m n
a b c d g h k m n
a b c d g h l m n
a b c e g h i m n
a b c e g h j m n
a b c e g h k m n
a b c e g h l m n
a b c f g h i m n
a b c f g h j m n
a b c f g h k m n
a b c f g h l m n
另一个例子:
>>> line = 'a b c d,e,f g h i,j,k,l m n q,w,e,r f o o'
>>> items = [x[0].split(',') for x in re.findall(r'((\w+,)+\w)',line)]
>>> strs = re.sub(r'((\w+,)+\w+)','{}',line)
for prod in product(*items):
print (strs.format(*prod))
...
a b c d g h i m n q f o o
a b c d g h i m n w f o o
a b c d g h i m n e f o o
a b c d g h i m n r f o o
a b c d g h j m n q f o o
a b c d g h j m n w f o o
a b c d g h j m n e f o o
a b c d g h j m n r f o o
a b c d g h k m n q f o o
a b c d g h k m n w f o o
a b c d g h k m n e f o o
a b c d g h k m n r f o o
a b c d g h l m n q f o o
a b c d g h l m n w f o o
a b c d g h l m n e f o o
a b c d g h l m n r f o o
a b c e g h i m n q f o o
a b c e g h i m n w f o o
a b c e g h i m n e f o o
a b c e g h i m n r f o o
a b c e g h j m n q f o o
a b c e g h j m n w f o o
a b c e g h j m n e f o o
a b c e g h j m n r f o o
a b c e g h k m n q f o o
a b c e g h k m n w f o o
a b c e g h k m n e f o o
a b c e g h k m n r f o o
a b c e g h l m n q f o o
a b c e g h l m n w f o o
a b c e g h l m n e f o o
a b c e g h l m n r f o o
a b c f g h i m n q f o o
a b c f g h i m n w f o o
a b c f g h i m n e f o o
a b c f g h i m n r f o o
a b c f g h j m n q f o o
a b c f g h j m n w f o o
a b c f g h j m n e f o o
a b c f g h j m n r f o o
a b c f g h k m n q f o o
a b c f g h k m n w f o o
a b c f g h k m n e f o o
a b c f g h k m n r f o o
a b c f g h l m n q f o o
a b c f g h l m n w f o o
a b c f g h l m n e f o o
a b c f g h l m n r f o o
答案 2 :(得分:1)
如果我已正确理解您的示例您需要关注
import itertools
sss = "a b c d,e,f g h i,j,k,l m n d,e,f "
coma_separated = [i for i in sss.split() if ',' in i]
spited_coma_separated = [i.split(',') for i in coma_separated]
symbols = (i for i in itertools.product(*spited_coma_separated))
#use generator statement to save memory
for s in symbols:
st = sss
for part, symb in zip(coma_separated, s):
st = st.replace(part, symb, 1) # To prevent replacement of the
# same coma separated group replace once
# for first occurance
print (st.split()) # for python3 compatibility
答案 3 :(得分:1)
大多数其他答案只生成一行而不是您想要的多行。
为了实现您的目标,您可以通过多种方式开展工作。
递归解决方案对我来说似乎最直观:
def dothestuff(l):
for n, i in enumerate(l):
if ',' in i:
# found a "," entry
items = i.split(',')
for j in items:
for rest in dothestuff(l[n+1:]):
yield l[:n] + [j] + rest
return
yield l
line = "a b c d,e,f g h i,j,k,l m n"
for i in dothestuff(line.split()): print i
答案 4 :(得分:0)
for i in range(len(line)-1):
if line[i] == ',':
line = line.replace(line[i]+line[i+1], '')
答案 5 :(得分:0)
import itertools
line_data = 'a b c d,e,f g h i,j,k,l m n'
comma_fields_indices = [i for i,val in enumerate(line_data.split()) if "," in val]
comma_fields = [i.split(",") for i in line_data.split() if "," in i]
all_comb = []
for val in itertools.product(*comma_fields):
sline_data = line_data.split()
for index,word in enumerate(val):
sline_data[comma_fields_indices[index]] = word
all_comb.append(" ".join(sline_data))
print all_comb