Python从文件读取URL并打印到文件

时间:2015-03-07 17:19:32

标签: python

我有一个文本文件中的URL列表,我想从中获取文章文本,作者和文章标题。当获得这三个元素时,我希望将它们写入文件。到目前为止,我可以从文本文件中读取URL,但Python只打印出URL和一个(最终文章)。如何重写我的脚本以便Python读取和写入每个URL和内容?

我必须使用以下Python脚本(版本2.7 - Mac OS X Yosemite):

from newspaper import Article

f = open('text.txt', 'r') #text file containing the URLS
for line in f:
    print line

url = line
first_article = Article(url)
first_article.download()

first_article.parse()

# write/append to file 
with open('anothertest.txt', 'a') as f:
    f.write(first_article.title)
    f.write(first_article.text)

print str(first_article.title)

for authors in first_article.authors:
    print authors
if not authors:
    print 'No author'

print str(first_article.text)

1 个答案:

答案 0 :(得分:0)

您正在收到上一篇文章,因为您正在遍历该文件的所有行:

for line in f:
    print line

一旦循环结束,line包含最后一个值。

url = line

如果您在循环中移动代码的内容,那么:

with open('text.txt', 'r') as f: #text file containing the URLS
    with open('anothertest.txt', 'a') as fout:
        for url in f:
            print(u"URL Line: {}".format(url.encode('utf-8')))

            # you might want to remove endlines and whitespaces from 
            # around the URL, which what strip() does
            article = Article(url.strip())
            article.download()
            article.parse()

            # write/append to file 
            fout.write(article.title)
            fout.write(article.text)

            print(u"Title: {}".format(article.title.encode('utf-8')))

            # print authors only if there are authors to show.
            if len(article.authors) == 0:
                print('No author!')
            else:
                for author in article.authors:
                    print(u"Author: {}".format(author.encode('utf-8')))

            print("Text of the article:")
            print(article.text.encode('utf-8'))

我还做了一些改进来改进您的代码:

  • 使用open()也可以读取文件,以正确释放文件描述符 当你不再需要它时;
  • 调用输出文件fout以避免遮蔽第一个文件
  • 在进入循环之前完成了fout的开始调用,以避免在每次迭代时打开/关闭文件,
  • 检查article.authors的长度,而不是检查是否存在authors 由于authors因为article.authors因为{{1}}而无法进入循环,因此{{1}}不会存在 是空的。

HTH