我有<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
个文档的列表,结构如下。我需要删除此行:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">
<pdf2xml producer="poppler" version="0.62.0">
<page number="1" position="absolute" top="0" left="0" height="1262" width="892">
</page>
</pdf2xml>
使用Python代码,因为要手动删除它,因为有很多文件,这将非常耗时。
Unexpected error occurred: Unexpected HTTP response code when requesting URI 'http://localhost:8080/kie-server/services/rest/server/containers/SecondBPMProject_1.0.0/processes/com.testspace.secondbpmproject.TimerOne/instances'! Error code: 500, message: "Unable to create response: There is no start node that matches the trigger none"
答案 0 :(得分:2)
您可以逐行读取文件,然后将它们写回,而无需在文件中添加不需要的行。只要确定要删除的内容-正是您所写的那一行?总是第二行吗?是否每个!DOCTYPE
行?是第一行!DOCTYPE
吗?等等
import os
import sys
# Assumes first argument when running the script is a directory containing XML files
directory = sys.argv[1] if len(sys.argv) > 1 else "."
files = os.listdir(directory)
for f in files:
# Ignore not XML files
if not f.endswith(".xml"):
continue
# Read file content
with open(f, 'r') as f_in:
content = f_in.readlines()
# Rewrite the original file
with open(f, 'w') as f_out:
for line in content:
# The condition may differ based on what you really want to delete
if line != "<!DOCTYPE pdf2xml SYSTEM \"pdf2xml.dtd\">\n":
f_out.write(line)
要考虑的事项:
您真的需要/想要使用Python吗?有更好的解决方案。例如,如果您使用的是Linux或Mac,则可以使用sed
:
for f in *.xml; do sed -i '' -n '/<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">/!p' $f; done
答案 1 :(得分:0)
首先,打开文件:
f = open("yourfile.txt","r")
下一步,从文件中获取所有行:
lines = f.readlines()
现在您可以关闭文件:
f.close()
然后以写入模式重新打开它:
f = open("yourfile.txt","w")
然后,写回您的行,但要删除的行除外。您可能希望将“ \ n”更改为文件结尾使用的任何行。
for line in lines:
if not line.startswith('<!DOCTYPE'):
f.write(line)
最后,再次关闭文件。
f.close()