我在Python 3中用bs4编写了一个程序,以便成功获得维基百科的子类别。现在,我可以看到结果为print,但我无法将结果写入文件。
from bs4 import BeautifulSoup
import requests
import csv
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content,'lxml')
noOFsubcategories = soup.find('p')
print('------------------------------------------------------------------')
print(noOFsubcategories.text+'------------------------------------------------------------------')
tag = soup.find('div', {'class' : 'mw-category'})
links = tag.findAll('a')
#print(links)
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(links)
答案 0 :(得分:2)
只需稍加改动,在循环下写入,每个循环都会写一个文件链接
counter = 1
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
with open('subcategories.csv', 'a') as f:
f.write(link['href'].split(':')[1]+'\n')
出:
/wiki/Category:Formerly_proprietary_software
/wiki/Category:Freeware
/wiki/Category:Oracle_software
/wiki/Category:Proprietary_cross-platform_software
/wiki/Category:Proprietary_database_management_systems
/wiki/Category:Proprietary_operating_systems
/wiki/Category:Proprietary_version_control_systems
/wiki/Category:Proprietary_wiki_software
/wiki/Category:Shareware
/wiki/Category:VMware
/wiki/Category:Warez
更好:
# do not need to open file in each loop, just put it above loop
counter = 1
with open('subcategories.csv', 'a') as f:
for link in links:
print ( str(counter) + " " + link.text)
counter = counter + 1
f.write(link['href']+'\n')
答案 1 :(得分:0)
首先,使用索引和链接文本初始化列表列表,然后使用csv.writer
写入csv文件。请注意下面使用enumerate()
:
.success(function(data)) {
$scope.category = $scope.categories.filter(function(val,key){
return data.origin.category['yourKey']==val['yourKey']
})[0]
}
而且,您可以使用单个CSS selector来改进查找子类别的方式:
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
我正在执行的完整代码:
soup.select("div.mw-category a")
运行此代码后,import csv
from bs4 import BeautifulSoup
import requests
url = 'https://en.wikipedia.org/wiki/Category:proprietary software'
content = requests.get(url).content
soup = BeautifulSoup(content, 'lxml')
noOFsubcategories = soup.find('p')
tag = soup.find('div', {'class': 'mw-category'})
links = [[index, a.get_text()] for index, a in enumerate(tag.find_all('a'), start=1)]
with open('subcategories.csv', 'a') as f:
writer = csv.writer(f)
writer.writerows(links)
的内容将为:
subcategories.csv