我在这里经历了大量的线程,看看我是否能找到修复此代码的方法,但似乎无法让这个工作起来。我试图从网站上抓取链接然后写入csv。这是代码:
我找到了一种方法可以获得95%的方式,但是我错过了获取href的东西:
from bs4 import BeautifulSoup
import urllib.request
import urllib.parse
import csv
j = urllib.request.urlopen("http://cnn.com")
soup = BeautifulSoup(j, "lxml")
data = soup.find_all('a', href=True)
for url in soup.find_all('a', href=True):
#print(url.get('href'))
with open('marcel.csv', 'w', newline='') as csvfile:
write = csv.writer(csvfile)
write.writerows(data)
答案 0 :(得分:0)
这可能是你想要做的。
from bs4 import BeautifulSoup
import requests #better than urllib
import csv
j = requests.get("http://cnn.com").content
soup = BeautifulSoup(j, "lxml")
data = []
for url in soup.find_all('a', href=True):
print(url['href'])
data.append(url['href'])
print(data)
with open("marcel.csv",'w') as csvfile:
write = csv.writer(csvfile, delimiter = ' ')
write.writerows(data)