Question

我正在尝试创建我的程序函数，以便每个正则表达式.findAll（）方法正确地输入HTML解析器，如下面的代码片段所示。问题是我得到了

UnboundLocalError：赋值前引用的局部变量

用于headingList和imageList，具体取决于我如何更改代码。我认为这是因为if语句没有继续超过第一个if块，因为它是真的。我尝试使用if heading and image and description and storyLink and date:并在一个for循环中创建所有变量，但是当我运行程序时，没有任何事情发生。我认为这是我的代码的结构，或者甚至可能是图像变量的正则表达式可能导致问题，但我不这么认为。任何帮助将不胜感激:)

编辑：HTML snippet being used to parse from regex

def extractNews():
    selection = listbox.curselection()

    if selection == (0,):
        # Read the webpage:
        response = urlopen("file:///E:/University/IFB104/InternetArchive/Archives/Sun,%20October%201st,%202017.html")
        html = response.read()
        #regex
        heading = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
        image = findall((r'<span data-omni-sm-delegate="(.*)">(\n|\r)\s+<a href="(.*)></a>(\n|\r)\s+</span>'), str(html))  #<span data-omni-sm-delegate="(.*)">(\n|\r)\s+<a href="(.*)></a>(\n|\r)\s+</span>
        description = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
        storyLink = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))
        date = findall((r'<h2 class="post-title"><a href="(.*?)".*?>(.*?)</a></h2>'), str(html))

        if heading:
            headingList = []
            for link, title in heading:
                headingVariable = "%s" % (title)
                headingList.append(headingVariable)

        if image:
            imageList = []
            for link, title in image:
                imageVariable = "%s" % (title)
                imageList.append(imageVariable)

        if description:
            descriptionList = []
            for link, title in description:
                descriptionVariable = "%s" % (title)
                descriptionList.append(descriptionVariable)

        if storyLink:
            storyLinkList = []
            for link, title in storyLink:
                storyLinkVariable = "%s" % (title)
                storyLinkList.append(storyLinkVariable)

        if date:
            dateList = []
            for link, title in date:
                dateVariable = "%s" % (title)
                dateList.append(dateVariable)




        html_str = ('<!DOCTYPE html>\n'
        '<html>\n'
        '<head>\n'
        '<title>TechCrunch Archive - Sun, October 1st, 2017</title>\n'
        '</head>\n'
        '<body>\n'
        '<h1>' + headingList[0] + '</h1>\n'
        '<a href="'+ imageList[0]+'></a>\n'
        '<p>description goes here</p>\n'
        '<p>full story link goes here</p>\n'
        '<p>date goes here</p>\n'
        '<br><br>\n'
        '<h1>' + headingList[1] + '</h1>\n'
        'image goes here\n'
        '<p>description goes here</p>\n'
        '<p>full story link goes here</p>\n'
        '<p>date goes here</p>\n'
        '<br><br>\n'
        '<h1>' + headingList[2] + '</h1>\n'
        'image goes here\n'
        '<p>description goes here</p>\n'
        '<p>full story link goes here</p>\n'
        '<p>date goes here</p>\n'
        '<br><br>\n'
        '</body>\n'
        '</html>)')

        Html_file = open("ExtractedContent/Sun, October 1st, 2017 - Extracted.html", "w")
        Html_file.write(html_str)
        Html_file.close()

Answer 1

我认为这是因为if语句没有继续过去首先，如果阻止它是真的。

if / elif就是这种情况。您的条件不正确，因此可能没有创建列表（很难说没有html），因为findall没有找到任何内容并返回一个False的空列表。

我尝试使用if heading and image and description and storyLink and date: 并在一个for循环中创建所有变量但是当我运行程序时，没有任何事情发生过。

没有任何事情发生，因为并非所有条件都是True。

Answer 2

由于image为[]，if image失败，且永远不会分配imageList。

请检查用于image的正则表达式。更好的是，使用适当的解析器（例如，HTMLParser）。

Python- UnboundLocalError：赋值前引用的局部变量 - 正则表达式/ if else

2 个答案: