Question

我正在尝试从旧网站获取标题。

在某些情况下我遇到的问题-null值。因此，我试图做一个while循环并更改URL。

我的While循环是否在正确的位置？

过程如下：

打开文件
获取网址
检查网址
获取标题
打印标题
而标题= null）：
替换部分网址，然后再次检查网址

from urllib.request import urlopen
from bs4 import BeautifulSoup
from openpyxl import Workbook
import os
import xlrd
import lxml

# set file location
os.chdir("/excel_files")

# set the name of the file
file_name = "old.xlsx"

# open workbook
workbook = xlrd.open_workbook(file_name)

# set existing worksheet
sheet = workbook.sheet_by_index(0)


temp_list = [20131022212405,20090127003537,2009012702352,]

for i in range(sheet.nrows):
    try:
        u = sheet.cell_value(i,1)
    html = urlopen(u)
    bsObj = BeautifulSoup(html.read(), features='lxml')
    # get title
    title = str(bsObj.title)
    print('row no. ',i, 'title is :' , title)
except:
    title = 'null'
while (title == 'null'):
    try:
        u = u.replace(temp_list[i], temp_list[i + 1])
        html = urlopen(u)
        bsObj = BeautifulSoup(html.read(), features='lxml')
        title = str(bsObj.title)
    except:
        print('title is :',title)

我一直都在获取null-而不是只获取实际上是null的行。

Answer 1

您似乎在第一个try/except循环（for）中的for i in range(sheet.nrows):缩进是错误的，try和except应该处于同一级别。

我如何检查while循环中数组中每个项目的参数？

1 个答案: