Question

我有一个文件中的字符串列表。我试图从每个字符串中提取子字符串并打印它们。字符串如下所示 -

Box1 is lifted\nInform the manufacturer
Box2 is lifted\nInform the manufacturer
Box3, Box4 is lifted\nInform the manufacturer
Box5, Box6 is lifted\nInform the manufacturer
Box7 is lifted\nInform the manufacturer

从每行我必须在\n之前提取字符串并打印它们。我使用以下Python正则表达式来做到这一点 - term = r'.*-\s([\w\s]+)\\n' 这个正则表达式适用于第1行，第2行和最后一行。但它不适用于第3行和第4行，因为字符串中有,。我应该如何修改我的正则表达式以适应它？

预期结果 -

Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

目前获得的结果 -

Box1 is lifted
Box2 is lifted
Box2 is lifted
Box2 is lifted
Box7 is lifted

Answer 1

如果这是一致的格式，您可以在换行符上拆分：

''.join(YOURSTRING.split('\n')[0].split(','))

编辑，因为我错过了关于删除逗号的部分。

Answer 2

正则表达式对于像这样的基本字符串操作来说是过度的。使用内置字符串方法，如分区和替换：

for line in lines:
    first, sep, last = line.partition('\n')
    newline = first.replace(',','')
    print (newline)

编辑。如果\ n是从文件读取的行中的文字序列，请使用r'\ n'而不是'\ n'。

Answer 3

逗号不是\ W或\ s字符集的一部分。term = r'.*-\s([\w\s,]+)\\n'应该做你想要的。

Answer 4

为什么不像term = r"[*]*(is lifted)"那样简单。或者，如果不需要，请不要使用正则表达式。编辑：我认为这可能会更好term = r"(Box[0-9])?(, Box[0-9])*(is lifted)"

Answer 5

这样的事情怎么样？：

from io import StringIO

ok = '''Box1 is lifted\\nInform the manufacturer
Box2 is lifted\\nInform the manufacturer
Box3, Box4 is lifted\\nInform the manufacturer
Box5, Box6 is lifted\\nInform the manufacturer
Box7 is lifted\\nInform the manufacturer
'''
ok = StringIO(ok)
strings = [' '.join(x.split()).replace('\\n', '').replace(',', '') for x in ok.split('Inform the manufacturer')]
>>> for x in strings: print x
... 
... 
Box1 is lifted
Box2 is lifted
Box3 Box4 is lifted
Box5 Box6 is lifted
Box7 is lifted

Answer 6

如果以下内容适合您，请与我们联系。

input="Box3, Box4 is lifted\nInform the manufacturer"
input=input.replace(",","",1)
print(input)
print(input[0:input.index("\n")])
input="Box1 is lifted\nInform the manufacturer"
print(input[0:input.index("\n")])

Answer 7

您可以尝试使用正则表达式并捕获该组：

一线解决方案：

import re
pattern=r'\w.+(?=\\n)'

print([re.search(pattern,line).group() for line in open('file','r')])

输出：

['Box1 is lifted', 'Box2 is lifted', 'Box3, Box4 is lifted', 'Box5, Box6 is lifted', 'Box7 is lifted']

详细解决方案：

import re
pattern=r'\w.+(?=\\n)'
with open('newt','r') as f:
    for line in f:
        print(re.search(pattern,line).group())

输出：

Box1 is lifted
Box2 is lifted
Box3, Box4 is lifted
Box5, Box6 is lifted
Box7 is lifted

从包含逗号的字符串中提取子字符串

7 个答案: