Question

我最近开始编写和学习Python，我目前正在开发一个Web Scraper。我想从多个网站抓取数据并将其保存为JSON文件格式。所以它目前只是打印出搜索结果。我希望网站抓取数据保存在JSON文件中。我正在编写此代码但是得到错误的＃34;不是JSON可序列化的＃34;。它不是在文件名文件中写入。在Mac终端上使用Python 2.7.14。下面是Scraper.py文件。

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip
import json

urls = ['http://www.ctex.cn', 'http://www.ss-gate.org/']
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    #open the file "filename" in write ("w") mode
    file = open("filename", "w")
    json_data = json.dumps(my_list,file)
    #json.dump(soup, file)
    file.close()

我也在使用不同的代码，但它仍然没有在文件名文件中写入。错误＆＃34;不是JSON可序列化的＃34;。下面是Scraper2.py文件。

from bs4 import BeautifulSoup
import requests
import pprint
import re
import pyperclip

urls = ['http://www.ctex.cn', 'http://www.ss-gate.org/']
#scrape elements
for url in urls:
    response = requests.get(url)
    soup = BeautifulSoup(response.content, "html.parser")
    #print(soup)

import json
# open the file "filename" in write ("w") mode
file = open("filename", "w")
#output = soup
# dumps "output" encoded in the JSON format into "filename"
json.dump(soup, file)
file.close()

Answer 1

逻辑

你的问题有点模棱两可因为我不确定你想要请求还是解析器？最好不要混淆他们

技术

html格式不完全适合json
我建议用两种方法来解决它

将每个文本保存为html文件

您可以将response.text（不是response.content）保存到html文件中像这样

for url in urls:
    url = A_URL
    res = requests.get(url)
    html_file = open('FILENAME.html','w')
    html_file.write(res.text)
    html_file.close()

或

将多个结果保存到json文件

out_list = []
for url in urls:
    res = requests.get(url)
    out_list.append(res.text)
json_file = open('out.json','w')
json_dump(out_list,json_file)
json_file.close()

并编写另一个程序来解析它们

加油

如何以JSON文件格式保存Python Web scraper输出？

1 个答案:

逻辑

技术

将每个文本保存为html文件

将多个结果保存到json文件