我正在尝试从旧网站获取标题。
在某些情况下我遇到的问题-null
值。
因此,我试图做一个while循环并更改URL。
我的While
循环是否在正确的位置?
过程如下:
from urllib.request import urlopen
from bs4 import BeautifulSoup
from openpyxl import Workbook
import os
import xlrd
import lxml
# set file location
os.chdir("/excel_files")
# set the name of the file
file_name = "old.xlsx"
# open workbook
workbook = xlrd.open_workbook(file_name)
# set existing worksheet
sheet = workbook.sheet_by_index(0)
temp_list = [20131022212405,20090127003537,2009012702352,]
for i in range(sheet.nrows):
try:
u = sheet.cell_value(i,1)
html = urlopen(u)
bsObj = BeautifulSoup(html.read(), features='lxml')
# get title
title = str(bsObj.title)
print('row no. ',i, 'title is :' , title)
except:
title = 'null'
while (title == 'null'):
try:
u = u.replace(temp_list[i], temp_list[i + 1])
html = urlopen(u)
bsObj = BeautifulSoup(html.read(), features='lxml')
title = str(bsObj.title)
except:
print('title is :',title)
我一直都在获取null
-而不是只获取实际上是null
的行。
答案 0 :(得分:0)
您似乎在第一个try/except
循环(for
)中的for i in range(sheet.nrows):
缩进是错误的,try
和except
应该处于同一级别。