我是网络抓取新手,我正在尝试编写一个简单的脚本来从大学课程目录表中获取课程名称:
private readonly Submit = (() => {
const instance = this;
return function (this: HTMLElement, ev: Event) {
if (instance.PreSubmissionValidation()) {
}
}
})();
当我运行时,我收到以下错误:
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary(r'C:\Program Files\Mozilla Firefox\firefox.exe')
driver = webdriver.Firefox(firefox_binary=binary)
url = 'https://courses.illinois.edu/schedule/2018/fall/CS'
driver.get(url)
course_names = []
for i in range(1, 69):
if(float(i)%2 != 0): #odd row number
curr_name = driver.find_element_by_css_selector('tr.odd:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text
else:
curr_name = driver.find_element_by_css_selector('tr.even:nth-child(i) > td:nth-child(2) > a:nth-child(1)').text
course_names.append(curr_name)
print(course_names)
driver.quit()
我完全迷失了如何解决这个问题。我只是想让它通过表格。它似乎不喜欢我。我知道这有效:
InvalidSelectorException: Message: Given css selector expression "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)" is invalid: InvalidSelectorError: 'tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)' is not a valid selector: "tr.odd:nth-child(str(i)) > td:nth-child(2) > a:nth-child(1)"
有什么建议吗?
答案 0 :(得分:2)
您的代码存在多个问题:
i
用作选择器中的字符。替换为nth-child(" + str(i) + ")
您正在过滤脚本和选择器中的奇数行和偶数行。选择一个,而不是两个。
定位元素并在循环中读取文本非常昂贵。使用一些JavaScript直接刮取文本将是一种更好的方法。
rows = driver.execute_script("""
return [].map.call(document.querySelectorAll('#default-dt tbody tr'), row => [
row.cells[0].innerText, /* Course number */
row.cells[1].innerText, /* Course title */
row.querySelector('[href]').href /* Course link */
]);
""")
for code, title, href in rows:
print(code, title, href)