Question

因此，我正在编写代码，以将特定的xml文档转换为用于呈现故事的html文档。我已经设法到达那里的大部分方式，但是当我从将列表连接成字符串并将该新字符串附加到列表中时，该列表为空。我试图利用有限的理解来对问题的根源进行故障排除，但到目前为止还很短。我将向您展示我的代码以及我认为问题所在的区域。

我已经修复了我注意到的一件事，我需要的变量不是我所使用的变量，但是我已经遍历了代码，找不到任何这种类似的东西。


import codecs
import re

fileIn = codecs.open("differenceInAbility.xml", "r", "utf-8")
text = fileIn.read()
fileIn.close()

chapterTitle = re.findall(r'<chapter number="(\d)" name="(.+?)">', text)
chapters = re.findall(r'<chapter number="\d" name=".+?">(.+?)</chapter>', text, flags=re.DOTALL)
paragraphs = re.findall(r"<paragraph>(.+?)</paragraph>", text, flags=re.DOTALL)

cleanParagraphs = []
for entry in paragraphs:
    cleanup = re.sub(r"\r\n[ ]+", " ", entry)
    cleanup2 = re.sub(r"[ ]+", " ", cleanup)
    cleanParagraphs.append(cleanup2)
chaptersHTML = []
chapterCounter = 1
for entry in chapters:
    if chapterTitle[0] == r"\d+":
        chapterHTML = "<h1> Chapter " + chapterCounter + " - " + chapterTitle[1] + "</h1>"
        chapterTitle.pop(0)
        chapterTitle.pop(1)
        paragraphsHTML = []
        for paragraph in cleanParagraphs:
            if paragraph in entry:
                p = "<p>" + paragraph + "</p>"
                paragraphsHTML.append(p)
        allParagraphsHTML = "\n".join(paragraphsHTML)
        wholeSection = chapterHTML + allParagraphsHTML
        chaptersHTML.append(wholeSection)
        chapterCounter += 1


print(chaptersHTML)

我认为相关的部分是：

 paragraphsHTML = []
        for paragraph in cleanParagraphs:
            if paragraph in entry:
                p = "<p>" + paragraph + "</p>"
                paragraphsHTML.append(p)
        allParagraphsHTML = "\n".join(paragraphsHTML)
        wholeSection = chapterHTML + allParagraphsHTML
        chaptersHTML.append(wholeSection)

因为cleanParagraphs列表具有正确的内容，因此xml文档的每个段落都是该列表中自己的条目。

问题可能是if paragraph in entry，因为它没有将“条目”的一部分注册为其中的段落吗？

如果是这样，我将如何解决呢？我如何确保它知道哪一章在哪一章？

Answer 1

cleanParagraphs的内容不是原始的子字符串，因此，它们当然不会出现在未更改的chapters值中。您应该分别处理每个章节（包括将其分成几段），这样就不必重新发现它包含的段落（避免对两个章节之间碰巧相同的段落进行错误处理）。

当我应该向其中添加很多文本时为空列表

1 个答案: