从包含逗号的字符串中提取子字符串

时间:2017-11-29 19:11:36

标签: python string

我有一个文件中的字符串列表。我试图从每个字符串中提取子字符串并打印它们。字符串如下所示 -

Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer

从每行我必须在\n之前提取字符串并打印它们。我使用以下Python正则表达式来做到这一点 - term = r'.*-\s([\w\s]+)\\n' 这个正则表达式适用于第1行,第2行和最后一行。但它不适用于第3行和第4行,因为字符串中有,。我应该如何修改我的正则表达式以适应它?

预期结果 -

Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

目前获得的结果 -

Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted

7 个答案:

答案 0 :(得分:2)

如果这是一致的格式,您可以在换行符上拆分:

''.join(YOURSTRING.split('\n')[0].split(','))

编辑,因为我错过了关于删除逗号的部分。

答案 1 :(得分:2)

正则表达式对于像这样的基本字符串操作来说是过度的。使用内置字符串方法,如分区和替换:

for line in lines:
    first, sep, last = line.partition('\n')
    newline = first.replace(',','')
    print (newline)

编辑。如果\ n是从文件读取的行中的文字序列,请使用r'\ n'而不是'\ n'。

答案 2 :(得分:2)

逗号不是\ W或\ s字符集的一部分。term = r'.*-\s([\w\s,]+)\\n'应该做你想要的。

答案 3 :(得分:1)

为什么不像term = r"[*]*(is lifted)"那样简单。或者,如果不需要,请不要使用正则表达式。 编辑:我认为这可能会更好term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"

答案 4 :(得分:1)

这样的事情怎么样? :

from io import StringIO

ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
... 
... 
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

答案 5 :(得分:0)

如果以下内容适合您,请与我们联系。

input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])

答案 6 :(得分:0)

您可以尝试使用正则表达式并捕获该组:

  

一线解决方案:

import re
pattern=r'\w.+(?=\\n)'

print([re.search(pattern,line).group() for line in open('file','r')])

输出:

['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']
  

详细解决方案:

import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
    for line in f:
        print(re.search(pattern,line).group())

输出:

Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted