我搜索了这个并尝试根据我的问题调整一些答案。
我想删除的文字格式是这样的:
text
text
rectangle
(
gab "BACKGROUND"
set("can be different")
origin(can be different)
width(can be different)
height(can be different)
)
text
text
我正在尝试删除矩形以及包含括号的括号之间的所有内容。 这些矩形在文件中出现几次。
到目前为止,我有以下内容:
def removeBlock():
for somefile in os.listdir(source_folder):
if (somefile.startswith(('DTSPSM_')) and somefile.endswith(('.ddl'.lower()))):
with open(os.path.join(source_folder, somefile), 'r') as file :
lines = file.read()
lines.strip()
for lines in file:
blockstart = lines.index('rectangle')
del(lines[blockstart:blockstart+7])
open(os.path.join(source_folder, somefile), 'w+').writelines(lines)
但它没有删除矩形线,有人可以帮忙吗?
我现在的(工作代码)代码现在看起来像这样:
import shutil
import tempfile
from pathlib import Path
import typing
source_folder = r'C:\Test'
test_file = r'C:Test\DTSPSM_01.ddl'
def parseFile(src_file: typing.Union[Path, str], simulation: bool=False):
print(src_file)
if isinstance(src_file, str):
src_file = Path(src_file)
with src_file.open('r') as input_file:
counter = 0
for line in input_file:
if line.strip() =='rectangle':
counter = 7
print(f'-{counter}-removing: {line}')
continue
elif counter > 0:
counter -= 1
print(f'-{counter}-removing: {line}')
continue
else:
yield line
def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
if isinstance(src_file, str):
src_file = Path(src_file)
with tempfile.TemporaryDirectory() as tmpdirname:
temp_filename = f'{tmpdirname}/{src_file.name}.txt'
with open(temp_filename, 'w') as temp_file:
for line in parseFile(src_file, simulation=simulation):
temp_file.write(line)
if not simulation:
shutil.copy(temp_filename, src_file)
def main():
for src_file in Path(source_folder).glob("DTSPSM_*.ddl"):
print(f'***{src_file}***\n')
clean_file(src_file)
if __name__== "__main__":
main()
答案 0 :(得分:1)
To prevent writing and reading from the same file you can use a temporary file, and then copy this to replace the original file. This way you also don't need to worry about what happens when something interrupts the process
def parse_file(src_file: typing.Union[Path, str], simulation: bool=False):
if isinstance(src_file, str):
src_file = Path(src_file)
with src_file.open('r') as input_file:
counter = 0
for line in input_file:
if line.strip() =='rectangle':
counter = 7
print(f'-{counter}-removing: {line}')
if simulation:
yield line
continue
elif counter > 0:
counter -= 1
print(f'-{counter}-removing: {line}')
if simulation:
yield line
continue
else:
yield line
This can be easily expanded where you feed a dict of words and linenumbers to ignore as extra argument. I've added a simulation
argument so you can do a dry run without altering anything
This can be easily tested by passing a file you know:
test_file = "DTSPSM_0.dll"
for line in parse_file(test_file):
print(line)
text1 text2 -7-removing: rectangle -6-removing: ( -5-removing: gab "BACKGROUND" -4-removing: set("can be different") -3-removing: origin(can be different) -2-removing: width(can be different) -1-removing: height(can be different) -0-removing: ) text3 text4
def clean_file(src_file:typing.Union[Path, str], simulation: bool=False):
if isinstance(src_file, str):
src_file = Path(src_file)
with tempfile.TemporaryDirectory() as tmpdirname:
temp_filename = f'{tmpdirname}/{src_file.name}.txt'
with open(temp_filename, 'w') as temp_file:
for line in parse_file(src_file, simulation=simulation):
temp_file.write(line)
if not simulation:
shutil.copy(temp_filename, src_file)
This writes the lines of the parsed file to a file in a temporary directory, and when finished copies this temporary file to overwrite the original file
This can be easily tested with 1 file too
pathlib.Path.glob
is easier than os.listdir
for src_file in Path(sourcedir).glob("DTSPSM_*.dll"):
print(f'***{src_file}***\n')
clean_file(src_file, simulation=True)
This can be easily tested seperately by commenting the last line