将Scraped数据存储到Python中的文本文件中

时间:2019-12-28 00:04:06

标签: python web-scraping

我能够使用Beautifulsoup抓取数据,现在希望生成一个文件,其中包含使用Beautiful Soup从中抓取的所有数据。

file = open("copy.txt", "w") 
data = soup.get_text()
data
file.write(soup.get_text()) 
file.close() 

我在文本文件中看不到所有标签和全部内容。关于如何实现它的任何想法?

2 个答案:

答案 0 :(得分:1)

您可以使用:

with open("copy.txt", "w") as file:
    file.write(str(soup))

如果您有要抓取的URL列表,然后要将每个抓取的URL存储在不同的文件中,则可以尝试:

my_urls = [url_1, url_2, ..., url_n]
for index, url in enumerate(my_urls):
    # .............
    # some code to scrape 
    with open(f"scraped_{index}.txt", "w") as file:
        file.write(str(soup))

答案 1 :(得分:0)

快速解决方案:

您只需要将汤转换为字符串即可。如果其他人希望遵循,请使用测试网站:

from bs4 import BeautifulSoup as BS
import requests

r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)

file = open("copy.txt", "w") 
file.write(str(soup))
file.close()

更好的解决方案:

更好的做法是将上下文用于文件IO(使用with):

from bs4 import BeautifulSoup as BS
import requests

r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)

with open("copy.txt", "w") as file:
    file.write(str(soup))