Question

我正在尝试将所有超链接网址保存在CSV文件的在线论坛中，用于研究项目。

当我'打印'html抓取结果时，似乎工作正常，因为它打印了我想要的所有网址，但是我无法将这些写入CSV中的单独行。

我显然做错了什么，但我不知道是什么！所以任何帮助都将不胜感激。

这是我写的代码：

import urllib2
from bs4 import BeautifulSoup
import csv
import re

soup = BeautifulSoup(urllib2.urlopen('http://forum.sex141.com/eforum/forumdisplay.php?    fid=28&page=5').read())

urls = []

for url in soup.find_all('a', href=re.compile('viewthread.php')):
        print url['href']

csvfile = open('Ss141.csv', 'wb')
writer = csv.writer(csvfile)

for url in zip(urls):
        writer.writerow([url])

csvfile.close()

Answer 1

您需要将匹配添加到urls列表中：

for url in soup.find_all('a', href=re.compile('viewthread.php')):
    print url['href']
    urls.append(url)

，您无需在此处使用zip()。

最好只在您找到它们时编写您的网址，而不是先将它们收集在列表中：

soup = BeautifulSoup(urllib2.urlopen('http://forum.sex141.com/eforum/forumdisplay.php?fid=28&page=5').read())

with open('Ss141.csv', 'wb') as csvfile:
    writer = csv.writer(csvfile)
    for url in soup.find_all('a', href=re.compile('viewthread.php')):
        writer.writerow([url['href']])

当块完成时，with语句将为您关闭文件对象。

从html中截取网址，使用BeautifulSoup保存在csv中

1 个答案: