我尽力了,但是后面的代码总是存在一些语法错误。
import urllib.request
import re
import csv
from bs4 import BeautifulSoup
from bs4 import NavigableString
from unicodedata import normalize
url = input('Please paste the link here: ')
html = urllib.request.urlretrieve(url)
html_file = open(html[0])
soup = BeautifulSoup(html_file, 'html5lib')
def contains_href(tag):
return tag.find('a', href=True)
scrollables = [table in soup.find_all('table', class_='sc_courselist') if contains_href(table)]
def num_name_unit(tag):
td_num = tag.find('td', href=True)
num = normalize('NFKD', td_num.string.strip())
td_name = tag.find('td', class_=False)
name = normalize('NFKD', td_name.string.strip())
td_unit = tag.find('td', class_='hourscol')
unit = normalize('NFKD', td_unit.string.strip())
row = ['Course Number: {0} | Course Name: {1} | Course Unit: {2}'.format(num, name, unit)]
return row
dic_rows = {scrollable.find_previous_siblings(re.compile('h'), class_=False, limit=1).string.strip(): list(num_name_unit(tr) for tr in scrollable.find_all('tr', contains_href)) for scrollable in scrollables}
我希望终端会打印以下请求:“请在此处粘贴链接:”。实际上,它在scrollables = [汤表中的表时显示“无效语法”。find_all('table',class _ ='sc_courselist')如果contains_href(table)]。
答案 0 :(得分:0)
您缺少列表中的for部分。应该是
[table for table in soup.find_all('table', class_='sc_courselist') if contains_href(table)]