我需要刮掉所有的' a'带有"结果标题"的标签上课,所有' span'任何类别的标签'结果价格'和'结果 - '然后,将输出写入跨多个列的.csv文件。当前代码不会向csv文件打印任何内容。这可能是错误的语法,但我真的无法看到我所缺少的。感谢。
f = csv.writer(open(r"C:\Users\Sean\Desktop\Portfolio\Python - Web Scraper\RE Competitor Analysis.csv", "wb"))
def scrape_links(start_url):
for i in range(0, 2500, 120):
source = urllib.request.urlopen(start_url.format(i)).read()
soup = BeautifulSoup(source, 'lxml')
for a in soup.find_all("a", "span", {"class" : ["result-title hdrlnk", "result-price", "result-hood"]}):
f.writerow([a['href']], span['results-title hdrlnk'].getText(), span['results-price'].getText(), span['results-hood'].getText() )
if i < 2500:
sleep(randint(30,120))
print(i)
scrape_links('my_url')
答案 0 :(得分:1)
如果要通过一次调用find_all
找到多个标记,则应将其传递到列表中。例如:
soup.find_all(["a", "span"])
如果无法访问您正在抓取的页面,则很难为您提供完整的解决方案,但我建议您一次提取一个变量并打印它以帮助您进行调试。例如:
a = soup.find('a', class_ = 'result-title')
a_link = a['href']
a_text = a.text
spans = soup.find_all('span', class_ = ['results-price', 'result-hood'])
row = [a_link, a_text] + [s.text for s in spans]
print(row) # verify we are getting the results we expect
f.writerow(row)