从文件中删除文本块

时间:2017-12-15 10:15:26

标签: python-3.x

我搜索了这个并尝试根据我的问题调整一些答案。

我想删除的文字格式是这样的:

text
text 
            rectangle 
           (
                gab "BACKGROUND" 
                set("can be different") 
                origin(can be different) 
                width(can be different)
                height(can be different)
            )
text
text

我正在尝试删除矩形以及包含括号的括号之间的所有内容。 这些矩形在文件中出现几次。

到目前为止,我有以下内容:

def removeBlock(): 

for somefile in os.listdir(source_folder):
    if (somefile.startswith(('DTSPSM_')) and somefile.endswith(('.ddl'.lower()))):
        with open(os.path.join(source_folder, somefile), 'r') as file :

            lines = file.read()
            lines.strip()
            for lines in file:
                blockstart = lines.index('rectangle')         
                del(lines[blockstart:blockstart+7])               
                open(os.path.join(source_folder, somefile), 'w+').writelines(lines) 

但它没有删除矩形线,有人可以帮忙吗?

我现在的(工作代码)代码现在看起来像这样:

import shutil
import tempfile
from pathlib import Path
import typing

source_folder = r'C:\Test'
test_file = r'C:Test\DTSPSM_01.ddl'

def parseFile(src_file: typing.Union[Path, str], simulation: bool=False):
    print(src_file)
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with src_file.open('r') as input_file:
        counter = 0
        for line in input_file:
            if line.strip() =='rectangle':
                counter = 7
                print(f'-{counter}-removing: {line}')
                continue
            elif counter > 0:
                counter -= 1
                print(f'-{counter}-removing: {line}')
                continue
            else:
                yield line

def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with tempfile.TemporaryDirectory() as tmpdirname:
        temp_filename = f'{tmpdirname}/{src_file.name}.txt'
        with open(temp_filename, 'w') as temp_file:
            for line in parseFile(src_file, simulation=simulation):
                temp_file.write(line)
        if not simulation:
            shutil.copy(temp_filename, src_file)

def main():

    for src_file in Path(source_folder).glob("DTSPSM_*.ddl"):
        print(f'***{src_file}***\n')
        clean_file(src_file)


if __name__== "__main__":
    main()         

1 个答案:

答案 0 :(得分:1)

To prevent writing and reading from the same file you can use a temporary file, and then copy this to replace the original file. This way you also don't need to worry about what happens when something interrupts the process

Parsing a single file:

def parse_file(src_file: typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with src_file.open('r') as input_file:
        counter = 0
        for line in input_file:
            if line.strip() =='rectangle':
                counter = 7
                print(f'-{counter}-removing: {line}')
                if simulation:
                    yield line
                continue
            elif counter > 0:
                counter -= 1
                print(f'-{counter}-removing: {line}')
                if simulation:
                    yield line
                continue
            else:
                yield line

This can be easily expanded where you feed a dict of words and linenumbers to ignore as extra argument. I've added a simulation argument so you can do a dry run without altering anything

This can be easily tested by passing a file you know:

test_file = "DTSPSM_0.dll"
for line in parse_file(test_file):
    print(line)
text1
text2 
-7-removing:             rectangle 
-6-removing:            (
-5-removing:                 gab "BACKGROUND" 
-4-removing:                 set("can be different") 
-3-removing:                 origin(can be different) 
-2-removing:                 width(can be different)
-1-removing:                 height(can be different)
-0-removing:             )
text3
text4
def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
    if isinstance(src_file, str):
        src_file = Path(src_file)
    with tempfile.TemporaryDirectory() as tmpdirname:
        temp_filename = f'{tmpdirname}/{src_file.name}.txt'
        with open(temp_filename, 'w') as temp_file:
            for line in parse_file(src_file, simulation=simulation):
                temp_file.write(line)
        if not simulation:
            shutil.copy(temp_filename, src_file)

This writes the lines of the parsed file to a file in a temporary directory, and when finished copies this temporary file to overwrite the original file

This can be easily tested with 1 file too

Finding the files:

pathlib.Path.glob is easier than os.listdir

for src_file in Path(sourcedir).glob("DTSPSM_*.dll"):
    print(f'***{src_file}***\n')
    clean_file(src_file, simulation=True)

This can be easily tested seperately by commenting the last line