简短&容易 - soup.find_all不返回多个标签元素

时间:2017-11-25 02:50:50

标签: python web-scraping beautifulsoup

我需要刮掉所有的' a'带有"结果标题"的标签上课,所有' span'任何类别的标签'结果价格'和'结果 - '然后,将输出写入跨多个列的.csv文件。当前代码不会向csv文件打印任何内容。这可能是错误的语法,但我真的无法看到我所缺少的。感谢。

f = csv.writer(open(r"C:\Users\Sean\Desktop\Portfolio\Python - Web Scraper\RE Competitor Analysis.csv", "wb"))

def scrape_links(start_url):
for i in range(0, 2500, 120):
    source = urllib.request.urlopen(start_url.format(i)).read()
    soup = BeautifulSoup(source, 'lxml')
    for a in soup.find_all("a", "span", {"class" : ["result-title hdrlnk", "result-price", "result-hood"]}):
        f.writerow([a['href']], span['results-title hdrlnk'].getText(), span['results-price'].getText(), span['results-hood'].getText() )
    if i < 2500:
        sleep(randint(30,120))
    print(i)


scrape_links('my_url')

1 个答案:

答案 0 :(得分:1)

如果要通过一次调用find_all找到多个标记,则应将其传递到列表中。例如:

soup.find_all(["a", "span"])

如果无法访问您正在抓取的页面,则很难为您提供完整的解决方案,但我建议您一次提取一个变量并打印它以帮助您进行调试。例如:

a = soup.find('a', class_ = 'result-title')
a_link = a['href']
a_text = a.text

spans = soup.find_all('span', class_ = ['results-price', 'result-hood'])

row = [a_link, a_text] + [s.text for s in spans]
print(row) # verify we are getting the results we expect

f.writerow(row)