我最近开始编写和学习Python,目前我正在开发一个webcrawler。所以它目前只是打印出搜索结果。我想要的是它将数据保存到JSON文件中。
import requests
import json
from bs4 import BeautifulSoup
url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419"
r = requests.get(url)
soup = BeautifulSoup(r.content)
g_data = soup.find_all("div", {"class": "listRow"})
for item in g_data:
try:
print item.find_all("span", {"class": "name"})[0].text#1
print item.find_all("span", {"class": "additional"})[0].text#2
print item.find_all("span", {"class": "info"})[0].text#3
print item.find_all("span", {"class": "info"})[1].text#4
print item.find_all("span", {"class": "info"})[2].text#5
print item.find_all("span", {"class": "price right right10"})[0].text#6
except:
pass
这就是我想要的回报:
{"product1":[{"1":"itemfindallresults1"},{"2":"itemfindallresults2"}]} etc
那我怎么能这样做? 提前谢谢。
答案 0 :(得分:1)
简单的JSON用法是:
import json
# open the file "filename" in write ("w") mode
file = open("filename", "w")
# just an example dictionary to be dumped into "filename"
output = {"stuff": [1, 2, 3]}
# dumps "output" encoded in the JSON format into "filename"
json.dump(output, file)
file.close()
希望这有帮助。
答案 1 :(得分:0)
一个满足您要求的简单程序。
import requests
import json
from bs4 import BeautifulSoup
url= "http://www.alternate.nl/html/product/listing.html?navId=11622&tk=7&lk=9419"
r = requests.get(url)
soup = BeautifulSoup(r.content)
product = Product()
g_data = soup.find_all("div", {"class": "listRow"})
for item in g_data:
try:
product.set_<field_name>(item.find_all("span", {"class": "name"})[0].text)
product.set_<field_name>("span", {"class": "additional"})[0].text
product.set_<field_name>("span", {"class": "info"})[0].text
product.set_<field_name>("span", {"class": "info"})[1].text
product.set_<field_name>("span", {"class": "info"})[2].text
product.set_<field_name>("span", {"class": "price right right10"})[0].text
except:
pass
import json
file = open("filename", "w")
output = {"product1": product}
json.dump(output, file)
file.close()