如何复制多行?

时间:2012-04-18 23:44:00

标签: python regex

我有以下文件:

this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc...

从“现在是第三行”到“但现在是第五行”,如何复制这三行(不知道这些行的行号)?在perl中,你会做类似的事情:

/^now it is/../^but now/

python中的等价物是什么?

我有(显然只抓住其中一条线):

regex = re.compile("now it is")
for line in content:
    if regex.match(line):
        print line

编辑:

reg = re.compile(r"now it is.*but now it.*", re.MULTILINE | re.DOTALL)

matches = reg.search(urllib2.urlopen(url).read())
for match in matches.group():
    print match

打印:

n
o
w

i
t

i
s

.
.
.

即它返回字符而不是完整的行

4 个答案:

答案 0 :(得分:2)

我认为您只需要看re.MULTILINE flag。多亏了它,你可以执行类似的匹配,并从你想要的行中获得组合的文本。

修改

完整的解决方案涉及使用re.MULTILINEre.DOTALL标志,以及非贪婪的正则表达式:

>>> text = """this is the first line
and this is the second line
now it is the third line
wow, the fourth line
but now it's the fifth line
etc...
etc...
etc..."""
>>> import re
>>> match = re.search('^(now it is.*?but now.*?)$', text, flags=re.MULTILINE|re.DOTALL)
>>> print match.group()
now it is the third line
wow, the fourth line
but now it's the fifth line

答案 1 :(得分:2)

您可以轻松制作生成器来执行此操作

def re_range(f, re_start, re_end):
    for line in f:
        if re_start.match(line):
            yield line
            break
    for line in f:
        yield line
        if re_end.match(line):
            break

你可以这样称呼它

import re

re_start = re.compile("now it is")
re_end = re.compile("but now")
with open('in.txt') as f:
    for line in re_range(f, re_start, re_end):
        print line,

答案 2 :(得分:1)

f = open("yourfile") #that is, the name of your file with extension in quotes
f = f.readlines()

现在f将是文件中每一行的列表。 f [0]将是第一行,f [1]将是第二行,依此类推。要获取第三到第五行,您将使用f [2:5]

答案 3 :(得分:1)

那样的东西?

import re
valid = False
for line in open("/path/to/file.txt", "r"):
    if re.compile("now it is").match(line):
        valid = True
    if re.compile("but now").match(line):
        valid = False
    if valid:
        print line

就像这样,你一次只缓存一行,这与使用readlines()相反,你可以将整个文件缓存在内存中。

假设正则表达式模式在文本块中是唯一的,如果不是这种情况,请提供有关如何匹配起始行和结束行的详细信息。

如果你只是需要检查一行的开头是否更容易:

valid = False
for line in open("/path/to/file.txt", "r"):
    if line.startswith("now it is"):
        valid = True
    if line.startswith("but now"):
        valid = False
    if valid:
        print line