我能够使用Beautifulsoup抓取数据,现在希望生成一个文件,其中包含使用Beautiful Soup从中抓取的所有数据。
file = open("copy.txt", "w")
data = soup.get_text()
data
file.write(soup.get_text())
file.close()
我在文本文件中看不到所有标签和全部内容。关于如何实现它的任何想法?
答案 0 :(得分:1)
您可以使用:
with open("copy.txt", "w") as file:
file.write(str(soup))
如果您有要抓取的URL列表,然后要将每个抓取的URL存储在不同的文件中,则可以尝试:
my_urls = [url_1, url_2, ..., url_n]
for index, url in enumerate(my_urls):
# .............
# some code to scrape
with open(f"scraped_{index}.txt", "w") as file:
file.write(str(soup))
答案 1 :(得分:0)
快速解决方案:
您只需要将汤转换为字符串即可。如果其他人希望遵循,请使用测试网站:
from bs4 import BeautifulSoup as BS
import requests
r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)
file = open("copy.txt", "w")
file.write(str(soup))
file.close()
更好的解决方案:
更好的做法是将上下文用于文件IO(使用with
):
from bs4 import BeautifulSoup as BS
import requests
r = requests.get("https://webscraper.io/test-sites/e-commerce/allinone")
soup = BS(r.content)
with open("copy.txt", "w") as file:
file.write(str(soup))