我有一个文件中的字符串列表。我试图从每个字符串中提取子字符串并打印它们。字符串如下所示 -
Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer
从每行我必须在\n
之前提取字符串并打印它们。我使用以下Python正则表达式来做到这一点 - term = r'.*-\s([\w\s]+)\\n'
这个正则表达式适用于第1行,第2行和最后一行。但它不适用于第3行和第4行,因为字符串中有,
。我应该如何修改我的正则表达式以适应它?
预期结果 -
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted
目前获得的结果 -
Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted
答案 0 :(得分:2)
如果这是一致的格式,您可以在换行符上拆分:
''.join(YOURSTRING.split('\n')[0].split(','))
编辑,因为我错过了关于删除逗号的部分。
答案 1 :(得分:2)
正则表达式对于像这样的基本字符串操作来说是过度的。使用内置字符串方法,如分区和替换:
for line in lines:
first, sep, last = line.partition('\n')
newline = first.replace(',','')
print (newline)
编辑。如果\ n是从文件读取的行中的文字序列,请使用r'\ n'而不是'\ n'。
答案 2 :(得分:2)
逗号不是\ W或\ s字符集的一部分。term = r'.*-\s([\w\s,]+)\\n'
应该做你想要的。
答案 3 :(得分:1)
为什么不像term = r"[*]*(is lifted)"
那样简单。或者,如果不需要,请不要使用正则表达式。
编辑:我认为这可能会更好term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"
答案 4 :(得分:1)
这样的事情怎么样? :
from io import StringIO
ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
...
...
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted
答案 5 :(得分:0)
如果以下内容适合您,请与我们联系。
input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])
答案 6 :(得分:0)
您可以尝试使用正则表达式并捕获该组:
一线解决方案:
import re
pattern=r'\w.+(?=\\n)'
print([re.search(pattern,line).group() for line in open('file','r')])
输出:
['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']
详细解决方案:
import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
for line in f:
print(re.search(pattern,line).group())
输出:
Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted