Question

我正在测试以下脚本：

import re
import requests
from bs4 import BeautifulSoup
import os
import fileinput

Link = 'https://en.wikipedia.org/wiki/Category:1990'
q = requests.get(Link)
soup = BeautifulSoup(q.text)
#print soup
subtitles = soup.findAll('div',{'class':'links'})
#print subtitles


with  open("Anilinks.txt", "w") as f:
    for link in subtitles:
        x = link.find_all('a', limit=26)
        for a in x:
            url = a['href']
            f.write(url+'\n')

我正在尝试将每个链接复制/粘贴到文本文件中。该脚本似乎应该可以工作，但它实际上并没有做任何事情。

有人能帮我搞定吗？谢谢！

Answer 1

您可以将字符串写入文本文件，如下所示：

with open("yourfile.txt", "w") as f: f.write(yourstr)

如果您不想覆盖该文件，请使用＆＃34; a＆＃34;作为open中的第二个参数。请参阅https://docs.python.org/2/library/functions.html#open。

所以，我假设你有一个这样的链接列表：

[＆＃34; http://example.com＆＃34;，＆＃34; http://stackoverflow.com＆＃34;]

你想要一个像这样的文件：

http://example.com:
<!doctype html>
<html>
<body>
...
<h1>Example Domain</h1>
...
</body>
</html>

http://stackoverflow.com:
...

让我们开始迭代所有链接：

for url in yourlinks:

首先，您要将URL写入文件：

    with open("yourfile.txt", "a") as f:
        f.write(url+"\n") # the \n is a new line

现在您将网站内容下载到变量：

        content = urllib2.urlopen(url).read()

（由于编码可能会出现错误 - 我来自python3。）然后你把它写到文件中：

        f.write(content+"\n")

瞧！你现在应该有你的文件。

Answer 2

该页面上没有带有一类链接的div，因此//没有任何要迭代的项目。

for link in subtitles

在尝试循环结果之前，您正在以写入模式打开输出文件，因此您总是得到一个空文件。

如何编写文本文件的链接？

2 个答案: