在Python中删除多行

时间:2016-01-01 20:33:20

标签: python

我有一个看起来像这样的文件:

<VirtualHost *:80>
    ServerName Url1
    DocumentRoot Url1Dir
</VirtualHost>

<VirtualHost *:80>
    ServerName Url2
    DocumentRoot Url2Dir
</VirtualHost>

<VirtualHost *:80>
    ServerName REMOVE
</VirtualHost>

<VirtualHost *:80>
    ServerName Url3
    DocumentRoot Url3Dir
</VirtualHost>

我想删除这段代码(它没有改变):

<VirtualHost *:80>
    ServerName REMOVE
</VirtualHost>

我尝试使用下面的代码找到整段代码,但它似乎无法正常工作。

with open("out.txt", "wt") as fout:
        with open("in.txt", "rt") as fin:
            for line in fin:
                fout.write(line.replace("<VirtualHost *:80>\n    ServerName REMOVE\n</VirtualHost>\n", ""))

我试图为我的问题找到解决方案,但空手而归,所以非常感谢任何帮助。

在您投票之前我真的很想听到原因。

3 个答案:

答案 0 :(得分:4)

最快的方法是将整个文件读入字符串,执行替换,然后将字符串写入所需的文件。例如:

#!/usr/bin/python

with open('in.txt', 'r') as f:
      text = f.read()

      text = text.replace("<VirtualHost *:80>\n    ServerName REMOVE\n</VirtualHost>\n\n", '')

      with open('out.txt', 'w') as f:
            f.write(text)

答案 1 :(得分:1)

这是有限自动机解决方案,可以在开发过程中稍后进行修改。一开始可能看起来很复杂,但请注意,您可以独立查看每个状态值的代码。您可以在纸上绘制图形(节点为圆形,箭头为方向边),以便了解所做的工作

status = 0      # init -- waiting for the VirtualHost section
lst = []        # lines of the VirtualHost section
with open("in.txt") as fin, open("out.txt", "w") as fout:
    for line in fin:

        #-----------------------------------------------------------
        # Waiting for the VirtualHost section, copying.
        if status == 0: 
            if line.startswith("<VirtualHost"):
                # The section was found. Postpone the output.
                lst = [ line ]  # first line of the section
                status = 1
            else:
                # Copy the line to the output.
                fout.write(line)

        #-----------------------------------------------------------
        # Waiting for the end of the section, collecting.
        elif status == 1:   
            if line.startswith("</VirtualHost"):
                # The end of the section found, and the section
                # should not be ignored. Write it to the output.
                lst.append(line)            # collect the line
                fout.write(''.join(lst))    # write the section
                status = 0  # change the status to "outside the section"
                lst = []    # not neccessary but less error prone for future modifications
            else:
                lst.append(line)    # collect the line
                if 'ServerName REMOVE' in line: # Should this section to be ignored?
                    status = 2      # special status for ignoring this section
                    lst = []        # not neccessary 

        #-----------------------------------------------------------
        # Waiting for the end of the section that should be ignored.
        elif status == 2:   
            if line.startswith("</VirtualHost"):
                # The end of the section found, but the section should be ignored.
                status = 0  # outside the section
                lst = []    # not neccessary

答案 2 :(得分:1)

虽然上述答案是一种务实的方法,但它首先是脆弱而不灵活的 这是一些不那么脆弱的东西:

import re

def remove_entry(servername, filename):
    """Parse file , look for entry pattern and return new content

    :param str servername: The server name to look for
    :param str filename: The file path to parse content
    :return: The new file content excluding removed entry
    :rtype: str
    """
    with open(filename) as f:       
        lines = f.readlines()        
        starttag_line = None
        PATTERN_FOUND = False       

        for line, content in enumerate(lines):
            if '<VirtualHost ' in content: 
                starttag_line = line       
            # look for entry
            if re.search(r'ServerName\s+' + servername, content, re.I):
                PATTERN_FOUND = True
            # next vhost end tag and remove vhost entry
            if PATTERN_FOUND and '</VirtualHost>' in content:
                del lines[starttag_line:line + 1]
                return "".join(lines)        


filename = '/tmp/file.conf'

# new file content
print remove_entry('remove', filename)