只保存列表中的最后一项

时间:2017-06-26 16:35:13

标签: python html list parsing beautifulsoup

我有这段代码是更大文件的一部分。当我运行它时,for循环的输出就是我正在寻找的并且它打印得很好;但是,只有最后一项的文本才会保存到.txt文件中。我是python的新手,我觉得这是一个简单的新手错误,但我在这里难倒。我认为在顶部制作一个新文件可以解决这个问题,但没有运气。

with open("all_ctrl_pk_articles.txt","w") as f:
        f.write("")
        for url in ctrl_pk_list:
            re = requests.get(url)
            soup = BeautifulSoup(re.content, "html.parser")
            g_data = soup.find_all("div", {"class": "story-body-supplemental"})
            for item in g_data:
                print item.contents[1].text #WANT TO SAVE THIS TEXT
                source_code = requests.get(url)
                plain_text = source_code.text
                soup = BeautifulSoup(plain_text, "html.parser")
                #print soup.text
                newsoup = soup.text
           f.write(newsoup)

        with io.open("all_ctrl_pk_articles.txt","a", encoding = "utf-8") as f:
           f.write(newsoup)
    f.close()

1 个答案:

答案 0 :(得分:0)

您的问题是您在循环期间没有写入文件(您还必须将“写入”更改为“追加”,否则它将被覆盖)

我希望这能回答OP的问题:

with open("all_ctrl_pk_articles.txt","a") as f:
        f.write("")
        for url in ctrl_pk_list:
            re = requests.get(url)
            soup = BeautifulSoup(re.content, "html.parser")
            g_data = soup.find_all("div", {"class": "story-body-supplemental"})
            for item in g_data:
                print item.contents[1].text #WANT TO SAVE THIS TEXT
                source_code = requests.get(url)
                plain_text = source_code.text
                soup = BeautifulSoup(plain_text, "html.parser")
                #print soup.text
                newsoup = soup.text
                f.write(newsoup) #Put this in the loop

        with io.open("all_ctrl_pk_articles.txt","a", encoding = "utf-8") as f:
           f.write(newsoup)
    f.close()