Question

我需要刮掉所有的＆＃39; a＆＃39;带有＆＃34;结果标题＆＃34;的标签上课，所有＆＃39; span＆＃39;任何类别的标签＆＃39;结果价格＆＃39;和＆＃39;结果 - ＆＃39;然后，将输出写入跨多个列的.csv文件。当前代码不会向csv文件打印任何内容。这可能是错误的语法，但我真的无法看到我所缺少的。感谢。

f = csv.writer(open(r"C:\Users\Sean\Desktop\Portfolio\Python - Web Scraper\RE Competitor Analysis.csv", "wb"))

def scrape_links(start_url):
for i in range(0, 2500, 120):
    source = urllib.request.urlopen(start_url.format(i)).read()
    soup = BeautifulSoup(source, 'lxml')
    for a in soup.find_all("a", "span", {"class" : ["result-title hdrlnk", "result-price", "result-hood"]}):
        f.writerow([a['href']], span['results-title hdrlnk'].getText(), span['results-price'].getText(), span['results-hood'].getText() )
    if i < 2500:
        sleep(randint(30,120))
    print(i)


scrape_links('my_url')

Answer 1

如果要通过一次调用find_all找到多个标记，则应将其传递到列表中。例如：

soup.find_all(["a", "span"])

如果无法访问您正在抓取的页面，则很难为您提供完整的解决方案，但我建议您一次提取一个变量并打印它以帮助您进行调试。例如：

a = soup.find('a', class_ = 'result-title')
a_link = a['href']
a_text = a.text

spans = soup.find_all('span', class_ = ['results-price', 'result-hood'])

row = [a_link, a_text] + [s.text for s in spans]
print(row) # verify we are getting the results we expect

f.writerow(row)

简短＆amp;容易 - soup.find_all不返回多个标签元素

1 个答案: