BeautifulSoup输出不会传输到CSV文件

时间:2017-09-10 13:04:20

标签: python-2.7 web-scraping beautifulsoup export-to-csv

我尝试将webscraper的输出导出为CSV文件。代码有效,当我在终端中运行它时,我得到正确的输出,但它不会传输到CSV文件。

问题

当我删除第一个for循环时,它工作正常,但我无法确切地知道这部分中的错误是什么?

代码

import csv ; import requests
from bs4 import BeautifulSoup

outfile = open('ImplementTest8.csv','w')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for li in soup.find('ul', class_='list-articles list').find_all('li'):
    level = li.find_all('dd', {'class': 'author'})[1].get_text()
    if "Graduate" in level:
        links = li.find_all("href")
        for link in links:
            if "career" in link.get("href") and 'COPENHAGEN' in link.text:
                item_link = link.get("href").strip()
                item_text = link.text.replace("View Position","").encode('utf-8').strip()
                writer.writerow([item_link, item_text])
                print(item_link, item_text)

已编辑的代码

import csv ; import requests
from bs4 import BeautifulSoup

outfile = open('ImplementTest8.csv','w')
writer = csv.writer(outfile)
writer.writerow(["job_link", "job_desc"])

res = requests.get("http://implementconsultinggroup.com/career/#/6257").text
soup = BeautifulSoup(res,"lxml")
links = soup.find_all("a")

for li in soup.find('ul', class_='list-articles list').find_all('li'):
    level = li.find_all('dd', {'class': 'author'})[1].get_text()
    if "Graduate" in level:
        links = li.find_all(href=True)
        for link in links:
            if "career" in link.get("href") and 'COPENHAGEN' in link.text:
                item_link = link.get("href").strip()
                item_text = link.text.replace("View Position","").encode('utf-8').strip()
                writer.writerow([item_link, item_text])
                print(item_link, item_text)

1 个答案:

答案 0 :(得分:2)

Href是标签属性而不是标签名称。如果您想确保所有链接都具有href属性,您可以将其用作keyward argument,否则请使用标记名称。

links = li.find_all(href=True)