BeautifulSoup Absoute URL打印到CSV

时间:2017-02-24 04:12:38

标签: python-3.x beautifulsoup urllib

我在这里经历了大量的线程,看看我是否能找到修复此代码的方法,但似乎无法让这个工作起来。我试图从网站上抓取链接然后写入csv。这是代码:

我找到了一种方法可以获得95%的方式,但是我错过了获取href的东西:

    from bs4 import BeautifulSoup
    import urllib.request
    import urllib.parse
    import csv

    j = urllib.request.urlopen("http://cnn.com")
    soup = BeautifulSoup(j, "lxml") 
    data = soup.find_all('a', href=True)

    for url in soup.find_all('a', href=True):
#print(url.get('href'))

        with open('marcel.csv', 'w', newline='') as csvfile:
            write = csv.writer(csvfile)
            write.writerows(data)

1 个答案:

答案 0 :(得分:0)

这可能是你想要做的。

from bs4 import BeautifulSoup
import requests #better than urllib
import csv

j = requests.get("http://cnn.com").content
soup = BeautifulSoup(j, "lxml") 

data = []
for url in soup.find_all('a', href=True):
    print(url['href'])
    data.append(url['href'])

print(data)

with open("marcel.csv",'w') as csvfile:
    write = csv.writer(csvfile, delimiter = ' ')
    write.writerows(data)