大家好我试图从多个网站中提取文字 一切正常,但是当我运行脚本时,我只提取一个网站 来自我创建的有3个网站的域名(列表) 我做错了什么我需要将所有domian项提取到文件
由于
from bs4 import BeautifulSoup
import requests
import urllib3
import certifi
http = urllib3.PoolManager(
cert_reqs='CERT_REQUIRED',
ca_certs=certifi.where())
domain =('https://www.betfair.com/exchange/', 'https://docs.python.org/3/library/urllib.parse.html','https://anaconda.org/pypi/urllib3')
for url in domain:
page = requests.get(url, verify=True)
soup = BeautifulSoup(page.content, 'html.parser')
content = (soup.get_text().encode('utf-8'))
with open("article.txt", "w") as wa, open("article.txt", "r") as ra, open('outfile.txt', "w") as outfile:
wa.write(content)
for line in ra:
if not line.strip(): continue
outfile.write(line)
答案 0 :(得分:-1)
我相信你每次都会覆盖这个文件。这就是你应该以追加模式打开文件的原因,如下所示:
with open('filename.txt', 'a'):
...
希望有所帮助