Question

我有<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">个文档的列表，结构如下。我需要删除此行：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">

<pdf2xml producer="poppler" version="0.62.0">
<page number="1" position="absolute" top="0" left="0" height="1262" width="892">

</page>
</pdf2xml>

使用Python代码，因为要手动删除它，因为有很多文件，这将非常耗时。

Unexpected error occurred: Unexpected HTTP response code when requesting URI 'http://localhost:8080/kie-server/services/rest/server/containers/SecondBPMProject_1.0.0/processes/com.testspace.secondbpmproject.TimerOne/instances'! Error code: 500, message: "Unable to create response: There is no start node that matches the trigger none"

Answer 1

您可以逐行读取文件，然后将它们写回，而无需在文件中添加不需要的行。只要确定要删除的内容-正是您所写的那一行？总是第二行吗？是否每个!DOCTYPE行？是第一行!DOCTYPE吗？等等

import os
import sys

# Assumes first argument when running the script is a directory containing XML files
directory = sys.argv[1] if len(sys.argv) > 1 else "."
files = os.listdir(directory)

for f in files:
    # Ignore not XML files
    if not f.endswith(".xml"):
        continue

    # Read file content
    with open(f, 'r') as f_in:
        content = f_in.readlines()

    # Rewrite the original file
    with open(f, 'w') as f_out:
        for line in content:
            # The condition may differ based on what you really want to delete
            if line != "<!DOCTYPE pdf2xml SYSTEM \"pdf2xml.dtd\">\n":
                f_out.write(line)

要考虑的事项：

如果文件很大，您可能不想将它们加载到内存中
例如，如果您只想始终删除文件的第二行，则效率很低。
您真的需要/想要使用Python吗？有更好的解决方案。例如，如果您使用的是Linux或Mac，则可以使用sed：
```
for f in *.xml; do sed -i '' -n '/<!DOCTYPE pdf2xml SYSTEM "pdf2xml.dtd">/!p' $f; done
```

Answer 2

首先，打开文件：

f = open("yourfile.txt","r")

下一步，从文件中获取所有行：

lines = f.readlines()

现在您可以关闭文件：

f.close()

然后以写入模式重新打开它：

f = open("yourfile.txt","w")

然后，写回您的行，但要删除的行除外。您可能希望将“ \ n”更改为文件结尾使用的任何行。

for line in lines:
  if not line.startswith('<!DOCTYPE'):
    f.write(line)

最后，再次关闭文件。

f.close()

如何使用Python编辑XML文件？

2 个答案: