定期计算机生成的消息(简化):
Hello user123,
- (604)7080900
- 152
- minutes
Regards
使用python,如何在两个空行之间提取“(604)7080900”,“152”,“分钟”(即前导"- "
模式后面的任何文本)(空行为{{ 1“}在”Hello user123“之后和\n\n
之前的”问候“)。如果结果字符串列表存储在数组中则更好。谢谢!
编辑:两个空白行之间的行数不固定。
第二次编辑:
e.g。
\n\n
x1 x2 x3是好的,因为所有行都被2个空行包围,x4也是出于同样的原因。 x6不好,因为它后面没有空白行,x7不好,前面没有空白。 x2很好(不像x6,x7),因为前面的线是一条好线,后面的线也很好。
当我发布问题时,这个条件可能不明确:
hello
- x1
- x2
- x3
- x4
- x6
morning
- x7
world
谢谢
答案 0 :(得分:4)
>>> import re
>>>
>>> x="""Hello user123,
...
... - (604)7080900
... - 152
... - minutes
...
... Regards
... """
>>>
>>> re.findall("\n+\n-\s*(.*)\n-\s*(.*)\n-\s*(minutes)\s*\n\n+",x)
[('(604)7080900', '152', 'minutes')]
>>>
答案 1 :(得分:3)
最简单的方法是遍历这些行(假设你有一个行列表或一个文件,或者将字符串拆分成一个行列表),直到你看到一行只是'\n'
,然后检查每行以'- '
开头(使用startswith
字符串方法)并将其切片,存储结果,直到找到另一个空行。例如:
# if you have a single string, split it into lines.
L = s.splitlines()
# if you (now) have a list of lines, grab an iterator so we can continue
# iteration where it left off.
it = iter(L)
# Alternatively, if you have a file, just use that directly.
it = open(....)
# Find the first empty line:
for line in it:
# Treat lines of just whitespace as empty lines too. If you don't want
# that, do 'if line == ""'.
if not line.strip():
break
# Now starts data.
for line in it:
if not line.rstrip():
# End of data.
break
if line.startswith('- '):
data.append(line[:2].rstrip())
else:
# misformed data?
raise ValueError, "misformed line %r" % (line,)
编辑:由于你详细说明了你想做什么,这里是循环的更新版本。它不再循环两次,而是收集数据直到遇到“坏”行,并在遇到块分隔符时保存或丢弃收集的行。它不需要显式迭代器,因为它不会重新启动迭代,所以你只需要传递一个列表(或任何可迭代的)行:
def getblocks(L):
# The list of good blocks (as lists of lines.) You can also make this
# a flat list if you prefer.
data = []
# The list of good lines encountered in the current block
# (but the block may still become bad.)
block = []
# Whether the current block is bad.
bad = 1
for line in L:
# Not in a 'good' block, and encountering the block separator.
if bad and not line.rstrip():
bad = 0
block = []
continue
# In a 'good' block and encountering the block separator.
if not bad and not line.rstrip():
# Save 'good' data. Or, if you want a flat list of lines,
# use 'extend' instead of 'append' (also below.)
data.append(block)
block = []
continue
if not bad and line.startswith('- '):
# A good line in a 'good' (not 'bad' yet) block; save the line,
# minus
# '- ' prefix and trailing whitespace.
block.append(line[2:].rstrip())
continue
else:
# A 'bad' line, invalidating the current block.
bad = 1
# Don't forget to handle the last block, if it's good
# (and if you want to handle the last block.)
if not bad and block:
data.append(block)
return data
这就是行动:
>>> L = """hello
...
... - x1
... - x2
... - x3
...
... - x4
...
... - x6
... morning
... - x7
...
... world""".splitlines()
>>> print getblocks(L)
[['x1', 'x2', 'x3'], ['x4']]
答案 2 :(得分:1)
>>> s = """Hello user123,
- (604)7080900
- 152
- minutes
Regards
"""
>>> import re
>>> re.findall(r'^- (.*)', s, re.M)
['(604)7080900', '152', 'minutes']
答案 3 :(得分:1)
l = """Hello user123,
- (604)7080900
- 152
- minutes
Regards
Hello user124,
- (604)8576576
- 345
- minutes
- seconds
- bla
Regards"""
这样做:
result = []
for data in s.split('Regards'):
result.append([v.strip() for v in data.split('-')[1:]])
del result[-1] # remove empty list at end
并拥有:
>>> result
[['(604)7080900', '152', 'minutes'],
['(604)8576576', '345', 'minutes', 'seconds', 'bla']]