Question

我对Python非常陌生，事实上，这是我有史以来制作的第一个程序。它是一个简单的网络爬虫，正在通过网站的站点地图和收集数据。这个循环会运行很多次没有任何问题，每个文件将运行这个循环超过3000次。大约100个文件后，我得到索引超出范围错误，我不知道为什么。这是给出问题的循环：

for item in soup.find_all('loc'):
    newsItem = {
        'Category': '',
        'Title':    '',
        'Url':      ''
    }

    newsItem['Category'] = list(filter(None, item.text.replace(
        'http://www.nu.nl', '').replace('.html', '').split('/')))[0].title()
    newsItem['Title'] = list(filter(None, item.text.replace(
        'http://www.nu.nl', '').replace('.html', '').split('/')))[2].replace('-', ' ').title()
    newsItem['Url'] = item.text
    newsItems.append(newsItem)
    print_progress(counter, len(soup.find_all('loc')), 'Progress:')
    counter += 1

错误：

Traceback (most recent call last):
  File "theVerge.py", line 52, in <module>
    'http://www.nu.nl', '').replace('.html', '').split('/')))[2].replace('-', ' ').title()
IndexError: list index out of range

Answer 1

如果没有追溯，我最好的猜测就是这一行：

ui <- fluidPage(
  sidebarPanel(
    textInput("search", "", placeholder = "Search term") 
  ),
  htmlOutput("text")
)
server <- function(input, output) {
  output$text <- renderText(HTML(
    if (nchar(input$search))
      str_replace_all(text, sprintf("(%s)", input$search), "<mark>\\1</mark>") else
        text
  ))
}
shinyApp(ui = ui, server = server)

我猜是

的结果

newsItem['Title'] = list(filter(None, item.text.replace('http://www.nu.nl', '').replace('.html', '').split('/')))[2].replace('-', ' ').title()

内部没有3个项目，因此[2]索引超出范围。

哦，顺便说一句......那是一些非常粗糙的字符串替换行。为了便于阅读，您可能希望打破这些或找到更好的方法。阅读这段代码，我不知道是什么

newsItem['Title'] = list(filter(None, item.text.replace('http://www.nu.nl', '').replace('.html', '').split('/')))

即使我有一个输入的例子，

也会这样做：P

Python循环：索引超出范围ERROR

1 个答案: